Enhancing Soil and Water Assessment Tool Snow Prediction Reliability with Remote-Sensing-Based Snow Water Equivalent Reconstruction Product for Upland Watersheds in a Multi-Objective Calibration Process

: Precipitation occurs in two basic forms deﬁned as liquid state and solid state. Di ﬀ erent from rain-fed watershed, modeling snow processes is of vital importance in snow-dominated watersheds. The seasonal snowpack is a natural water reservoir, which stores snow water in winter and releases it in spring and summer. The warmer climate in recent decades has led to earlier snowmelt, a decline in snowpack, and change in the seasonality of river ﬂows. The Soil and Water Assessment Tool (SWAT) could be applied in the snow-inﬂuenced watershed because of its ability to simultaneously predict the streamﬂow generated from rainfall and from the melting of snow. The choice of parameters, reference data, and calibration strategy could signiﬁcantly a ﬀ ect the SWAT model calibration outcome and further a ﬀ ect the prediction accuracy. In this study, SWAT models are implemented in four upland watersheds in the Tulare Lake Basin (TLB) located across the Southern Sierra Nevada Mountains. Three calibration scenarios considering di ﬀ erent calibration parameters and reference datasets are applied to investigate the impact of the Parallel Energy Balance Model (ParBal) snow reconstruction data and snow parameters on the streamﬂow and snow water-equivalent (SWE) prediction accuracy. In addition, the watershed parameters and lapse rate parameters-led equiﬁnality is also evaluated. The results indicate that calibration of the SWAT model with respect to both streamﬂow and SWE reference data could improve the model SWE prediction reliability in general. Comparatively, the streamﬂow predictions are not signiﬁcantly a ﬀ ected by di ﬀ erently lumped calibration schemes. The default snow parameter values capture the extreme high ﬂows better than the other two calibration scenarios, whereas there is no remarkable di ﬀ erence among the three calibration schemes for capturing the extreme low ﬂows. The watershed and lapse rate parameters-induced equiﬁnality a ﬀ ects the ﬂow prediction more (Nash-Sutcli ﬀ e E ﬃ ciency (NSE) varies between 0.2–0.3) than the SWE prediction (NSE varies less than 0.1). This study points out the remote-sensing-based SWE reconstruction product as a promising alternative choice for model calibration in ungauged snow-inﬂuenced watersheds. The streamﬂow-reconstructed SWE bi-objective calibrated model could improve the prediction reliability of surface water supply change for the downstream agricultural region under the changing climate.


Data
ArcSWAT (an ArcGIS add-in), was used to develop the SWAT models for each watershed with topography, land use, and soils data. In ArcSWAT, each watershed was discretized into subwatersheds based on the topography using a United States Geological Survey (USGS) 30-m resolution, 1-arc second digital elevation model (DEM, http://ned.usgs.gov). A DEM was also used to derive the terrain slope, another variable that is used, together with soil type and land use, to divide each sub-watershed further into multiple hydrological response units (HRUs). The topography, land use, and soil data used in this study were obtained from the National Elevation Dataset 30-m Digital Elevation Model (NED DEM, http://ned.usgs.gov), the National Land Cover Dataset 2001 land use data (NLCD, https://www.mrlc.gov/data/nlcd-2001-land-cover-conus), and the State Soil Geographic dataset (STATSGO, https://www.nrcs.usda.gov/) respectively. When the SWAT models were built, Daymet (https://daymet.ornl.gov/) historical daily precipitation and temperature data (1-km by 1-km resolution, 1981-2013) were used as the climate drivers for the SWAT model to calibrate the streamflow and SWE. Monthly referenced streamflow data recorded at the outlets of the four watersheds were used to calibrate the model. Monthly reference streamflow

Data
ArcSWAT (an ArcGIS add-in), was used to develop the SWAT models for each watershed with topography, land use, and soils data. In ArcSWAT, each watershed was discretized into sub-watersheds based on the topography using a United States Geological Survey (USGS) 30-m resolution, 1-arc second digital elevation model (DEM, http://ned.usgs.gov). A DEM was also used to derive the terrain slope, another variable that is used, together with soil type and land use, to divide each sub-watershed further into multiple hydrological response units (HRUs). The topography, land use, and soil data used in this study were obtained from the National Elevation Dataset 30-m Digital Elevation Model (NED DEM, http://ned.usgs.gov), the National Land Cover Dataset 2001 land use data (NLCD, https://www.mrlc.gov/data/nlcd-2001-land-cover-conus), and the State Soil Geographic dataset (STATSGO, https://www.nrcs.usda.gov/) respectively. When the SWAT models were built, Daymet (https://daymet.ornl.gov/) historical daily precipitation and temperature data (1-km by 1-km resolution, 1981-2013) were used as the climate drivers for the SWAT model to calibrate the streamflow and SWE. Monthly referenced streamflow data recorded at the outlets of the four watersheds were Water 2020, 12, 3190 5 of 23 used to calibrate the model. Monthly reference streamflow data was calculated based on the observed reservoir operation data and was obtained from the California Department of Water Resource (DWR). The Parallel Energy Balance Model (ParBal) snow reconstruction dataset was used as the snow reference data for SWAT model calibration and validation. The daily ParBal SWE data was obtained from the University of California, Santa Barbara, with 500-m spatial resolution for the years 2001 to 2017 [58]. Reconstruction of the snow water-equivalent from remotely sensed data is a technique where the snowpack is built up in reverse from melt-out to peak using downscaled satellite-based energy balance forcings and a fractional snow-covered area, albedo, and snow grain size data. The Parallel Balance Model (ParBal) SWE data is a promising reconstructed SWE product derived from Moderate Resolution Imaging Spectroradiometer (MODIS) snow-covered area and grain size data. Although the ParBal data is not the direct observations, considering the limited coverage of ground-based SWE observations in the upland watersheds, the ParBal dataset becomes an important alternative for in-situ SWE observations.

SWAT Model Implementation
The SWAT Calibration and Uncertainty Program (SWAT-CUP, version 2012) was applied to calibrate the SWAT models. The Sequential Uncertainty Fitting (SUFI-2) algorithm was selected as the calibration technique in SWAT-CUP. SUFI-2 finds an optimal set of parameter values that best fit the reference data by narrowing down their predefined initial ranges over multiple iterations [59,60]. A total of 17 parameters were selected for the calibration of SWAT for each of the four upland watersheds (Table 1). These parameters were selected following the SWAT documentation and sensitivity analyses conducted on the SWAT model parameters by the California Department of Water Resource [36,61]. The 17 selected parameters were divided into three groups comprising snow parameters, watershed parameters, and elevation lapse rate parameters, as mentioned before (Table 1). It needs to be pointed out that the elevation lapse rate parameters (i.e., temperature and precipitation lapse rates, Tlapse and Plapse) could affect the partitioning between rain and snow from precipitation, as well as the distribution of snow in the high elevations, although they are not direct snow parameters. To account for orographic effects in mountainous watersheds, which both influence temperature and precipitation, SWAT allows the definition of up to 10 elevation bands in each sub-watershed. For each elevation band, SWAT simulates the accumulation, sublimation, and melting of snow. Outputs from each elevation band are averaged for each sub-watershed and reported in output files. Based on previous studies, it has been shown that, typically, using 3-5 elevation bands satisfies the accuracy of most model predictions while keeping the computational demand (e.g., running time) low [62]. In this study, five elevation bands were defined for each of the four upland watersheds (Kings, Kern, Tule, and Kaweah watersheds) [63][64][65]. The developed SWAT models were used to test and compare the different calibration strategies.  The default values are found in the SWAT Calibration and Uncertainty Program (SWAT-CUP) 2012 version input/output documentation, chapter 4 [63]. SWAT refers to the Soil and Water Assessment Tool.

SWAT Model Calibration
Three calibration scenarios were applied in this study. Scenario S1 and S2 applied the single-objective calibration, where the simulated streamflow was evaluated with the reference streamflow at the watershed outlet. The S3 scenario involved a multi-objective calibration, where, in addition to the streamflow, a second calibration objective was set for each watershed, consisting of the calibration of SWE at the sub-watershed scale using ParBal-reconstructed SWE reference data. The following describes in more detail how SWAT was calibrated for each calibration scenario. In the S1 calibration scenario, the SWAT models were calibrated with monthly streamflow reference data at the watershed outlet using the watershed parameters and elevation lapse rate parameters only. The snow parameters were left at their default values ( Table 2). In the S2 scenario, instead of applying the default snow parameter values, the snow parameters were also calibrated with reference streamflow data at the watershed outlet, in addition to the watershed parameters and elevation lapse rate parameters. In the S3 calibration scenario, a new two-stage lumped calibration strategy was developed, and all three groups of parameters were calibrated with both the monthly streamflow data at the watershed outlet and daily SWE data in the snow-influenced sub-watersheds. The total calibration period of the SWAT models was from 1981-2007, while the first three years were used as the warm-up period. Initially, the SWAT model conducted one iteration that consisted of 500 simulations. The simulation outcomes were evaluated using standard statistical measures of agreement between the observed and simulated streamflow, including the Coefficient of Determination (R 2 , Equation (1)), Nash-Sutcliffe Efficiency (NSE, Equation (2)), and p-value and r value. These performance measures are frequently used by many hydrological modeling studies. Further descriptions of these methods can be found below. Unless all the performance measures are satisfactory, a new iteration will be initiated again with another 500 simulations.
The Coefficient of Determination (R 2 ) is calculated using Equation (1). Where Obs t is the observation data at time step t, Sim t is the simulation data at time step t, and Obs is the average observation value.
The Nash-Sutcliffe Efficiency (NSE) is a normalized value determining the relative magnitude of the residual variance compared to the observed data variance. The NSE ranges between −1 and 1, with NSE = 1 being the optimal value. The mathematical equation for calculating NSE is shown in Equation (2). Where Obs t is the observation data at time step t, Sim t is the simulation data at time step t, and Obs and Sim are the average observation and average simulation values between time step 1~T, respectively.
The p-value and r value are used in the streamflow calibration to evaluate the prediction ensemble relative to the observations. The p-value is used to assess the percentage of reference data bracketed by the 95% prediction uncertainty calculated at the 2.5 and 97.5 percentiles of the cumulative distribution of the simulated variables (95PPU), and the r factor is the thickness of the 95PPU envelop.
The multi-objective lumped calibration strategy applied in the S3 scenario included two calibration stages. In the first step, the SWAT model was automatically calibrated with monthly reference streamflows at the watershed outlet using the calibration period from 1981-2000 (the first three years were used as a warm-up period, and all three groups of parameters were calibrated). In the second step, the SWAT model parameters obtained in the previous step were further manually calibrated with referenced SWE data at the sub-watershed scale for 2001-2007. Here, the best parameter set derived from the first step was used as the initial values for the SWE calibration in the second step. In each calibration round, the snow parameters and elevation lapse rate parameters were modified within their respective value ranges before the SWAT-CUP simulation was conducted. This process was repeated until the simulated SWE reached a satisfactory level of agreement with the reference SWE data. During the SWE calibration process, it was ensured that the quality of the simulated streamflow with the modified parameters stayed at acceptable fit levels.
For all three scenarios, the SWAT model performance and calibrated parameters were evaluated during the validation period (2008-2013). Both the monthly streamflow at the watershed outlet and daily SWE prediction of each sub-watershed were evaluated with reference data to assess the performance of the calibrated models. The choice of periods for model calibration or validation was mainly determined by the availability of reference data. Since we had longer streamflow reference data , whereas the ParBal reconstructed SWE data was relatively shorter (available starting from  (1981)(1982)(1983)(1984)(1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000), the first three years were used as the warm-up period) was applied as the streamflow calibration period (stage 1 calibration). The years after 2001 were split into two generally equal time spans (2001-2007 was used as the SWE calibration period, and 2008-2013 was used as the validation period) to ensure each period contained both dry and wet years. A longer validation period (i.e., validation starting earlier) was not applied, since it would squeeze the SWE calibration period and affect model calibration reliability, and the current way of splitting the periods was the ideal choice. The details for all three calibration scenarios are summarized in Table 3 and Figure 2.

Evaluation of Calibration Scenarios
The predicted streamflow from the three calibration scenarios were evaluated with a reference streamflow for the calibration period and validation period, respectively. The NSE and R 2 values were compared among the three scenarios to assess the impact of using a single vs. multi-objective calibration on the predicted flow and SWE. In addition, extreme flows are calculated and compared for the three modeling scenarios. Extreme flow conditions are strongly related to natural hazards

Evaluation of Calibration Scenarios
The predicted streamflow from the three calibration scenarios were evaluated with a reference streamflow for the calibration period and validation period, respectively. The NSE and R 2 values were compared among the three scenarios to assess the impact of using a single vs. multi-objective calibration on the predicted flow and SWE. In addition, extreme flows are calculated and compared for the three modeling scenarios. Extreme flow conditions are strongly related to natural hazards such as droughts (an extreme low-flow condition) and floods (an extreme high-flow condition), thus needing more attention. Especially in recent years, continuous severe drought conditions (2012-2016) have significantly affected California and caused huge amounts of economic loss and property damage [66,67]. For this reason, we calculated the 10%, 20%, 80%, and 90% quantile flows based on the streamflow predictions for 2001-2013 for all three scenarios. The calculated low flows and high flows were compared and evaluated among the three modeling scenarios.
In addition to the streamflow, the SWE predictions were also evaluated for the calibration and validation periods at both the sub-watershed scale and elevation band scale. For the SWE evaluation, the NSE values were calculated for all snow-dominated sub-watersheds for all three calibration scenarios, and their corresponding distributions (expressed as box-whisker plots) were evaluated. Since investigating the predicted SWE at the sub-watershed scale alone could not provide a clear picture about how the three modeling scenarios performed at different elevations, we also assessed the model performance under each scenario at the elevation band scale. For this, we took one sub-watershed from each watershed and calculated the HRU area weighted daily SWE for Elevation Band 1 to Elevation Band 5 for comparison.

Evaluating the Impact of Parameter Equifinality on Model Predictions
Different combinations of parameter values might lead to a similar level of fit between simulation and observation, a modeling phenomenon known as parameter equifinality [68,69]. In the S3 scenario, although the two-stage calibration strategy took both streamflow and SWE references into account, the second stage of calibration only started with the best set of parameter values derived from the first stage. Therefore, there might exist alternative parameter combinations that could result in similar or even better model predictions. Since it is difficult to manually test all the possible parameter combinations in the parameter space, the parameter equifinality could be investigated by evaluating the first few sets of best parameter combinations from the stage 1 calibration in S3. In this study, the values of the watershed parameters and elevation lapse rate parameters from the first 10 sets of the best parameters (listed in Table A1 in the Appendix A for the four upland watersheds) were used to substitute the corresponding values in the final calibrated parameter values from the S3 scenario. The modified 10 sets of parameters were applied for the SWAT simulation, respectively, for the calibration period 2001-2007 to test the equifinality effect on the streamflow and SWE, respectively. Figures 3 and 4 show the comparison between the simulated and observed streamflows for the three modeling scenarios (S1-S3) for the calibration and validation periods, respectively. The model performance is evaluated with NSE and R 2 values. The calibrated parameter values are shown in Table 1. The results indicate that all three calibration strategies generally reached a satisfactory level of streamflow prediction accuracy. For all three calibration scenarios, the NSE ranged from 0.4-0.9 and R 2 values ranged from 0.5-0.9 for all four watersheds. When comparing the two single-objective calibration scenarios, S1 and S2, the streamflow predictions performed slightly worse under the S2 than S1 scenario ( Figure 3). This indicates that including snow parameters in the streamflow calibration process increases the model complexity, which might have a negative effect on the streamflow prediction accuracy and might reduce the prediction reliability. The results further show that the NSEs from S3 scenario for the Kern and Tule watersheds are 0.81 and 0.78, respectively, which are slightly higher than the values obtained for the S1 (0.80 and 0.77) and S2 scenarios (0.78 and 0.56) for both watersheds. However, for the Kings and Kaweah watersheds, S3 resulted in a slightly worse model performance, with NSEs of 0.85 and 0.67 compared to S1 (NSEs were 0.93 and 0.76) and S2 (NSEs were 0.89 and 0.74). This might be because the Kings and Kaweah watersheds have higher elevations and are located at the relatively northern side of the study region, and more snow is accumulated there. Therefore, a longer calibration period , from S1 and S2) would better capture the hydrological processes. Moreover, no matter which calibration strategy is used, when applying the calibrated model to the validation period (2008-2013), the streamflow prediction accuracy slightly decreased compared to the calibration period. For example, for the Kern watershed, the sub-watershed NSEs ranged between 0.78-0.81 in the calibration period and between 0.60-0.65 in the validation period ( Figure 3). The decrease in model performance can, in part, be explained by the nonstationary nature of the streamflow under variable climate forcing and the fact that model calibration using a specific period from the historic record might not always reflect the same hydro-climatic conditions as observed during the validation period, as shown in Figure 4. Figure 4 further indicates that the calibration of snow parameters or involving the snow reference dataset could not improve the streamflow prediction accuracy. In general, the predicted streamflow performed better in the S1 modeling scenario in the Kings (R 2 = 0.93) and Kaweah (R 2 = 0.76) watersheds or reached a similar level of fit in all three calibration scenarios in the Kern (R 2 = 0.80) and Tule (R 2 = 0.78) watersheds. The insets in Figure 3(A3-D3) show the predicted streamflow extremes (10%, 20%, 80%, and 90% quantiles) versus observed data (black dashed line) for the 1983-2013 modeling period. The results indicate that, in general, the S1 scenario performs the best for predicting high-magnitude events among the three calibration scenarios (i.e., Kings, Tule, and Kaweah). However, for the low flow conditions (10% and 20% quantiles), there are no clear differences between the three calibration scenarios and watersheds. In contrast, Figure 3 points out that, in California, a place frequently affected by hydrological extremes, applying the default SWAT snow parameter values and calibrating the model with only streamflow reference data provides the best agreement between the observed and predicted extreme streamflow values. However, when predicting a low flow, any of the three calibration strategies could be applied, resulting in very similar low-flow predictions.

Impact of Three Model Calibration Scenarios on the Streamflow Predictions
scenarios and watersheds. In contrast, Figure 3 points out that, in California, a place frequently affected by hydrological extremes, applying the default SWAT snow parameter values and calibrating the model with only streamflow reference data provides the best agreement between the observed and predicted extreme streamflow values. However, when predicting a low flow, any of the three calibration strategies could be applied, resulting in very similar low-flow predictions.    , F-validation), and Kaweah (Gcalibration, H-validation) watersheds, respectively. R 2 refers to Coefficient of Determination, and S1-S3 refers to Scenario 1-Scenario 3.

Impact of the Three Model Calibration Scenarios on the SWE Predictions
The simulated SWE from the SWAT output is in the form of a snow water-equivalent value at the HRU level for each elevation band. The HRU area weighted SWE for each sub-watershed is further calculated for all four watersheds, respectively. The sub-watershed scale-calculated SWE from S1-S3 are compared with ParBal-referenced SWE data. In order to better represent the findings, we separated the snow-dominated sub-watersheds into two groups: calibrated sub-watersheds and poor-performing sub-watersheds. The calibrated sub-watersheds (marked as blue in Figure 5) are those NSEs of the SWE predictions that have reached or are close to the acceptable level, and they will be included in the following analysis and comparison, whereas the poor-performing subwatersheds (marked as red in Figure 5) are those SWE predictions from all S1-S3 modeling scenarios that are very poor, even after the two-stage calibration (i.e., NSE<0) and far away from acceptable NSE values. The poor-performing sub-watersheds are excluded from the following analysis. The sub-

Impact of the Three Model Calibration Scenarios on the SWE Predictions
The simulated SWE from the SWAT output is in the form of a snow water-equivalent value at the HRU level for each elevation band. The HRU area weighted SWE for each sub-watershed is further calculated for all four watersheds, respectively. The sub-watershed scale-calculated SWE from S1-S3 are compared with ParBal-referenced SWE data. In order to better represent the findings, we separated the snow-dominated sub-watersheds into two groups: calibrated sub-watersheds and poor-performing sub-watersheds. The calibrated sub-watersheds (marked as blue in Figure 5) are those NSEs of the SWE predictions that have reached or are close to the acceptable level, and they will be included in the following analysis and comparison, whereas the poor-performing sub-watersheds (marked as red in Figure 5) are those SWE predictions from all S1-S3 modeling scenarios that are very poor, even after the two-stage calibration (i.e., NSE<0) and far away from acceptable NSE values. The poor-performing sub-watersheds are excluded from the following analysis. The sub-watersheds need to be separated in this way, because the lumped calibration strategy was applied in S1-S3, and the parameter values found for the entire watershed might not be suitable for each individual sub-watershed simultaneously. Figure 5 indicates that the S3 calibration scenario could improve the overall SWE predictions of the calibrated sub-watersheds. Although there exist cases that the maximum values of NSE are not the highest after the snow calibrations of S3 (i.e., the Tule watershed; see Figure 5A), the variations of NSEs from the calibrated sub-watersheds are smaller (i.e., the range of NSEs is about 0-0.62 after snow calibration for the Tule watershed, whereas the ranges are −0.18-0.63 before snow calibration). Figure 5B shows the average NSE values for the snow-calibrated sub-watershed (shown in Figure 5C); the results indicate that the average NSE improved for S3 compared with the other two scenarios. Therefore, from the overall perspective, the SWE prediction is more reliable when applying the S3 calibration scenario compared with the other two calibration strategies. Figure 6 Table 4. It indicates that the S3 SWE prediction performance is better and more reliable among the three modeling scenarios for the Kings, Kern, and Tule watersheds. Additionally, S2 performs slightly worse than S1 for all four watersheds. This points out that involving the snow parameter alone without the SWE reference data in the SWAT model calibration would only increase the model parameter complexity, which would not improve the SWE prediction accuracy. Moreover, the second row to sixth rows show the SWE predictions for elevation bands 1-5, respectively, for each typical sub-watershed. The results point out that SWE predictions mainly deviate from each other at low elevations among the three calibration scenarios, whereas, at high elevations, the three calibration strategies show similar levels of SWE predications. Figure 6 indicates that the S3 scenario is better at capturing SWE at low elevations, which makes its predictions more accurate and reliable than S1 and S2.
Water 2020, 12, x FOR PEER REVIEW 13 of 24 watersheds need to be separated in this way, because the lumped calibration strategy was applied in S1-S3, and the parameter values found for the entire watershed might not be suitable for each individual sub-watershed simultaneously. Figure 5 indicates that the S3 calibration scenario could improve the overall SWE predictions of the calibrated sub-watersheds. Although there exist cases that the maximum values of NSE are not the highest after the snow calibrations of S3 (i.e., the Tule watershed; see Figure 5A), the variations of NSEs from the calibrated sub-watersheds are smaller (i.e., the range of NSEs is about 0-0.62 after snow calibration for the Tule watershed, whereas the ranges are −0.18-0.63 before snow calibration). Figure 5B shows the average NSE values for the snowcalibrated sub-watershed (shown in Figure 5C); the results indicate that the average NSE improved for S3 compared with the other two scenarios. Therefore, from the overall perspective, the SWE prediction is more reliable when applying the S3 calibration scenario compared with the other two calibration strategies. Figure 6 Table 4. It indicates that the S3 SWE prediction performance is better and more reliable among the three modeling scenarios for the Kings, Kern, and Tule watersheds. Additionally, S2 performs slightly worse than S1 for all four watersheds. This points out that involving the snow parameter alone without the SWE reference data in the SWAT model calibration would only increase the model parameter complexity, which would not improve the SWE prediction accuracy. Moreover, the second row to sixth rows show the SWE predictions for elevation bands 1-5, respectively, for each typical sub-watershed. The results point out that SWE predictions mainly deviate from each other at low elevations among the three calibration scenarios, whereas, at high elevations, the three calibration strategies show similar levels of SWE predications. Figure 6 indicates that the S3 scenario is better at capturing SWE at low elevations, which makes its predictions more accurate and reliable than S1 and S2.  respectively. (C) The snow-calibrated sub-watersheds (blue), poor-performing sub-watersheds (red), and one typical snow sub-watershed for each watershed (used in Figure 6). respectively. (C) The snow-calibrated sub-watersheds (blue), poor-performing sub-watersheds (red), and one typical snow sub-watershed for each watershed (used in Figure 6).

Influence of Parameter Equifinality on the Prediction Reliability
The calibrated snow parameters from S3 are applied, and the model equifinality was evaluated through the changing of the watershed and elevation lapse rate parameters in this study. Figures 7

Influence of Parameter Equifinality on the Prediction Reliability
The calibrated snow parameters from S3 are applied, and the model equifinality was evaluated through the changing of the watershed and elevation lapse rate parameters in this study. Figures 7 and 8 show the simulated streamflow and SWE based on the 1st-10th best sets of parameter values from calibration stage 1 in the S3 scenario for 2001-2007, respectively (previously, the S3 modeling scenario only considered the 1st best parameter set from 500 simulations from stage 1). The final snow parameter values derived from S3 scenario stage 2 were applied unchanged for all ten simulations. Figures 7 and 8 indicate that both the streamflow and SWE predictions based on the best parameter set from the stage 1 calibrations were not eventually the best among the ten sets. The streamflow predictions were more significantly affected by the equifinality issue, since the NSEs changed 0.2-0.4 from their smallest value to the largest value (i.e., the NSE values of Kings varied between 0.55-0.9; the change was about 0.35), given the ten sets that yielded almost similar levels of prediction in the stage 1 streamflow calibration period. Comparatively, the SWE predictions were also affected by the equifinality issue; however, the influence was smaller than the streamflow predictions. The medium of the NSE values varied less than 0.1 in most cases (i.e., it ranged 0.6-0.69 for the Kings watershed).
Water 2020, 12, x FOR PEER REVIEW 15 of 24 and 8 show the simulated streamflow and SWE based on the 1st-10th best sets of parameter values from calibration stage 1 in the S3 scenario for 2001-2007, respectively (previously, the S3 modeling scenario only considered the 1st best parameter set from 500 simulations from stage 1). The final snow parameter values derived from S3 scenario stage 2 were applied unchanged for all ten simulations. Figures 7 and 8 indicate that both the streamflow and SWE predictions based on the best parameter set from the stage 1 calibrations were not eventually the best among the ten sets. The streamflow predictions were more significantly affected by the equifinality issue, since the NSEs changed 0.2-0.4 from their smallest value to the largest value (i.e., the NSE values of Kings varied between 0.55-0.9; the change was about 0.35), given the ten sets that yielded almost similar levels of prediction in the stage 1 streamflow calibration period. Comparatively, the SWE predictions were also affected by the equifinality issue; however, the influence was smaller than the streamflow predictions. The medium of the NSE values varied less than 0.1 in most cases (i.e., it ranged 0.6-0.69 for the Kings watershed).

Applying ParBal-Reconstructed SWE Data in the Calibration Process
In general, this study shows that the SWAT calibration is strongly influenced by snow processes in the upland watersheds by matching the flow peak and smoothing the hydrograph related to snowmelt [70,71]. As expected, the results from this study prove that incorporating SWE reference data in the model calibration process would improve the model SWE prediction accuracy in general and enhance the overall prediction reliability. Despite that some previous studies applied remotesensing (i.e., MODIS)/in-situ-based snow cover reference datasets to validate the SWAT SWE predictions, they seldom directly included SWE reference information into the model calibration process as a means to further improve the results [72,73]. Although the multi-objective calibration of the SWAT model was not firstly introduced in this study (previously, researches applied secondreference datasets, such as soil moisture, evapotranspiration, or sediment loads, in addition to streamflows, to demonstrate the advantage of multi-objective calibration in improving the model performance [40,42,74]), SWE reference data is seldom considered, since in-situ snow records are lacking for many upland watersheds. Ground-based snow observation stations such as SNOTEL or other in-situ snow records could be applied to calibrate and validate the snow predictions. One recent study uses the SWE information derived from in-situ snow depth observations in the Upper Adige

Applying ParBal-Reconstructed SWE Data in the Calibration Process
In general, this study shows that the SWAT calibration is strongly influenced by snow processes in the upland watersheds by matching the flow peak and smoothing the hydrograph related to snowmelt [70,71]. As expected, the results from this study prove that incorporating SWE reference data in the model calibration process would improve the model SWE prediction accuracy in general and enhance the overall prediction reliability. Despite that some previous studies applied remote-sensing (i.e., MODIS)/in-situ-based snow cover reference datasets to validate the SWAT SWE predictions, they seldom directly included SWE reference information into the model calibration process as a means to further improve the results [72,73]. Although the multi-objective calibration of the SWAT model was not firstly introduced in this study (previously, researches applied second-reference datasets, such as soil moisture, evapotranspiration, or sediment loads, in addition to streamflows, to demonstrate the advantage of multi-objective calibration in improving the model performance [40,42,74]), SWE reference data is seldom considered, since in-situ snow records are lacking for many upland watersheds. Ground-based snow observation stations such as SNOTEL or other in-situ snow records could be applied to calibrate and validate the snow predictions. One recent study uses the SWE information derived from in-situ snow depth observations in the Upper Adige River watershed (6875 km 2 , a similar size to the Kern watershed) in the Northeastern Italian Alps for SWAT simulations [75]. Nevertheless, that study required sub-watershed levels and elevation band level snow depth observation stations densely distributed in the study area. However, to date, not many upland watersheds offer such highly dense in-situ SWE observation networks. In addition, they used a short SWE calibration and validation period (two years for each), which may not be able to capture both the wet/dry cycles. Comparatively, in our study, seven years were applied for the SWE calibrations, and another six years were used for the validation. Both wet and dry years were in the two periods. The remote-sensing-based SWE reconstruction dataset (such as ParBal) has wider applications as reconstructed SWE products become available for more mountainous regions. The two-stage calibration approach used in this study could be easily applied to other ungauged watersheds.

The Role of Snow Parameters in SWAT Model Predictions
The findings from this study demonstrated the importance of snow parameters in the hydrological model prediction accuracy. Comparing S1 and S2 in assigning snow parameter values to SWAT, the results showed that, in general, the default snow parameter values resulted in a similar performance compared with the calibrated snow parameters when only the streamflow reference data was applied. However, when comparing these two calibration strategies with S3, it pointed out that calibrating the snow parameters with the snow reference data would best benefit the SWE prediction accuracy. This indicates that involving snow parameter calibrations (instead of applying the default parameter values) would only introduce model parameter complexity, and it would not be useful unless the corresponding SWE reference dataset was used in the calibration process. One former study pointed out that the major application of default snow parameter values is to assess the applicability of the SWAT in ungauged watersheds, and there is little observed data available for calibrations [76]. This partly aligned with the findings from our study; however, we also found that default parameters might also be useful for calculating extreme flows, especially the high flow conditions for the upland watersheds. S1 behaves the best when simulating the 80% and 90% quantile high flows, and this broadened the application of the default snow parameters of the SWAT in identifying hydrological responses under extreme high-flow situations (such as flooding).

The Impact of Parameter Equifinality on the Model Prediction
The identification of parameter values is a major concern in calibrating the model for a specific watershed [36,77]. The results from this study point out that the calibration of snow parameters with corresponding SWE reference datasets definitely improves the SWE prediction reliability in general and reduces the equifinality issue to a certain level. However, this does not indicate that the final calibrated parameters got rid of the equifinality issues. To translate this in another way, the calibrated parameters just control the equifinality issue within our satisfactory range for the two dimensions (flow and SWE) in this study. The results indicate that, on the one hand, the more types of parameters and reference datasets (i.e., evapotranspiration and soil moisture) involved in the calibration process, the higher the chance of a calibrated model closer to reality (less of an equifinality issue). On the other hand, the model parameters calibration process also plays the role of compensating the overall uncertainty from different components in the modeling process to the reference dataset. For instance, the uncertainty comes from the simplification of the model structure and governing equations, as well as input climate data uncertainty, among others [16,78,79]. The calibrated parameters should satisfy the goal for the targeted variables predicted by the model. Therefore, we suggest the SWAT users apply the appropriate number of calibration parameters and reference datasets regarding the prediction variables that they are interested in instead of just simply involving as many parameter as possible and aiming to make it the "true model".

Summary and Conclusions
In this study, SWAT models were implemented for four uplands watersheds (Kings, Kern, Tule, and Kaweah) in Tulare Lake Basin. Three model calibration scenarios were applied in four upland watersheds located in the southern part of the Sierra Nevada Mountains using single (monthly streamflow at the watershed outlet) or multi-objective (monthly streamflow and daily ParBal-reconstructed SWE at the sub-watershed scale) calibrations under the lumped calibration scheme. The extreme flow conditions were also evaluated. Additionally, under the multi-objective calibration scheme, given the fixed calibrated snow parameters, the influence of the watershed parameters and elevation lapse rate parameters on the model parameter equifinality was investigated. Specifically, the following conclusions were drawn from this study:

1.
Although all three calibration strategies provide satisfactory streamflow predictions in general, slight differences exist among the three modeling scenarios, and the S3 calibration scenario does not always perform the best for streamflow prediction accuracy. In general, S2 performs slightly worse than the S1 scenario, which indicates that involving model parameter complexity might have a negative impact on the streamflow predictions if the model is calibrated only with a streamflow.

2.
The S3 calibration scenario shows its advantages for SWE predictions compared with S1 and S2. Even though, in some cases, the maximum NSE values of the sub-watersheds are not the highest after snow calibration, the variations are smaller. Therefore, the two-stage calibration strategy is recommended under the lumped calibration scheme for the upland watersheds, where snow plays a significant role.

3.
The S1 modeling scenario with default parameter values provides the best prediction for high flow conditions (related to floods), whereas no obvious differences exist among the three scenarios in simulating low flow conditions (droughts).

4.
All three scenarios work well for capturing large amounts of snow in the high elevations; however, the S3 scenario performs better than the other two scenarios in capturing small amounts of SWE in low elevations.

5.
The streamflow predictions are affected by the equifinality issue, and the changes of the NSE values vary between 0.2-0.3 for the ten parameter sets. Comparatively, the SWE prediction is also affected by the equifinality issue; however, the influence is slighter; the variation of the NSE is less than 0.1 for most cases. 6.
SWAT users are suggested to apply the appropriate number of calibration parameters and reference datasets for the goal (i.e., SWE and streamflow, in this study) that they are interested in. In most cases, there is no need to involve extra parameter complexity that is not related to the goal, and it might have a negative impact on the model predictions.
There exist certain limitations for this study. Since a lumped calibration strategy is applied for the SWE calibrations, it is difficult to make all the sub-watersheds reach a good fit simultaneously, which is caused by the heterogeneity natures of the different sub-watersheds. Additionally, although the model parameter equifinality issue is investigated, it is only partially tested for the first 10 best parameter sets from calibration stage 1 because of the large amounts of simulation work. In the future, the two-stage calibration strategy could be applied to more snow-dominated watersheds to strengthen the findings from this study.
Appendix A   Table A1. The 1st to 10th best parameter sets (rank1-rank10 correspond to the 10 sets of highest NSEs) from the last 500 simulations of the stage 1 calibrations in S3 for each watershed.