Time Variant Sensitivity Analysis of Hydrological Model Parameters in a Cold Region using Flow Signatures

: The complex terrain, seasonality, and cold region hydrology of the Nelson Churchill River Basin (NCRB) presents a formidable challenge for hydrological modeling, which complicates the calibration of model parameters. Seasonality leads to different hydrological processes dominating at different times of the year, which translates to time variant sensitivity in model parameters. In this study, Hydrological Predictions for the Environment model (HYPE) is set up in the NCRB to analyze the time variant sensitivity analysis (TVSA) of model parameters using a Global Sensitivity Analysis technique known as Variogram Analysis of Response Surfaces (VARS). TVSA can identify parameters that are highly influential in a short period but relatively uninfluential over the whole simulation period. TVSA is generally effective in identifying model’s sensitivity to event-based parameters related to cold region processes such as snowmelt and frozen soil. This can guide event-based calibration, useful for operational flood forecasting. In contrast to residual based metrics, flow signatures, specifically the slope of the mid-segment of the flow duration curve, allows VARS to detect the influential parameters throughout the timescale of analysis. The results are beneficial for the calibration process in complex and multi-dimensional models by targeting the informative parameters, which are associated with the cold region hydrological processes.


Introduction
Hydrological models are imprecise representations of real-world hydrological processes governed by mathematical equations and assumptions. Such imperfect representation of real-world hydrological processes often leads to uncertainty in model outputs, which are attributed to different sources of error including measurement error, model structural error and the model parameter uncertainty [1,2]. The model parameter uncertainty is attributed to a general lack of understanding required to accurately determine model parameters, and the fact that parameters compensate for errors in input data and model structure [1][2][3]. Hydrological models often have a large set of parameters linked to specific processes that dominate at different times of the year, leading to an increase in parameter uncertainty and an inherent seasonality effect of parameter sensitivity and uncertainty [4]. In general, most of the parameters associated with the hydrological processes are unknown and often require calibration to match the model output with the measured dataset(s).
Sensitivity analysis (SA) can assist the modeler in identifying the relative influence each model parameter has on the variation in model output. This information is typically used by the modeler to limit the number of parameters in calibration, thereby increasing the efficiency and computation speed of the calibration process [5,6]. In practice, SA is useful for determining the most informative parameters prior to model calibration. The least influential parameters can be set to their default values and thus will not require calibration; therefore, SA is recommended before the calibration of a hydrological model is initiated [7].
SA can be broadly classified into two types: (i) Local Sensitivity Analysis (LSA), and (ii) Global Sensitivity Analysis (GSA) [8]. LSA measures the parameter sensitivity at a particular location in the parameter space. However, for complex hydrological models, GSA methods such as Sobol [9] and Morris [10] are recommended because they measure the parameter sensitivity over the entire parameter space, making them robust and reliable [11,12]. These methods, especially the Sobol method, require a large computational budget to reach convergence [13]. Alternatively, response surface or surrogate models can be used to reduce the computational burden of complex models, while maintaining the reliability and robustness of a GSA technique [14][15][16][17].
Variogram Analysis of Response Surfaces (VARS) is a recently developed GSA technique [13] that claims to be more computationally efficient and reliable than classical GSA approaches, for continental-scale hydrological modeling. Some recent studies have applied VARS for a detailed SA of model parameters in complex models. For example, Lilhare et al. [18] chose Kling Gupta Efficiency (KGE), Nash Sutcliffe Efficiency (NSE) and Percent Bias (PBIAS) criterion in VARS to identify the most important parameters for the Variable Infiltration Capacity (VIC) model [19] applied to the Lower Nelson River Basin (LNRB). Sensitivity of the model to such parameters is generally dependent upon the evaluation criterion used, and hence the aforementioned study strongly recommends the use of multiple criteria for SA to gain a better understanding of the dynamics among and between model processes.
There are several studies in the cold region catchments within Canada to identify the sensitivity of the cold region processes. The catchments under study are diverse as the studies are conducted from the mountainous region of Yukon [20] to the Prairies of Saskatchewan [21]. These studies identify snowmelt runoff and frozen soil infiltration to be highly sensitive to warming temperature and rainfall intensity. Most previous SA studies, including those studies in the cold region, do not account for time varying sensitivity factor in the model parameters [13,[22][23][24]. There are few recent studies in which SA has been implemented by dividing the entire time period into different time slices or window periods [5,25]. The studies that currently exist are based on single, conventional metrics such as NSE or Root Mean Square Error (RMSE). Recently, Razavi and Gupta [26] applied a time varying GSA approach based on VARS using different indices of simulated streamflow, evapotranspiration (ET) and soil moisture storage. They recommend the use of TVSA to better control the calibration of the parameters and uncertainty that might arise during the model prediction. In addition to conventional performance evaluation metrics, flow signatures can inform model calibration and uncertainty analysis [27][28][29][30]. The use of flow signatures provides feedback to the calibration on different aspects of a hydrograph (e.g., rising or recession limbs versus peak flow), and the overall variability of flow characteristics [27]. Nonetheless, the application of flow signatures has been mostly limited to calibration of hydrological models and has not been presented simultaneously with SA.
Our study employs flow signatures, in addition to conventional error metrics, as evaluation criteria to analyze the seasonal and time variant sensitivity of model parameters for a cold region catchment (NCRB) using Hydrological Predictions for the Environment (HYPE). HYPE is a processbased semi-distributed hydrological model developed by the Swedish Meteorological and Hydrological Institute (SMHI) [31]. Our research addresses two specific objectives: (1) analyze the effect of time resolution on determining the sensitivity of the model parameters, and (2) application of TVSA and flow signatures to identify event-based sensitivity of model parameters in a cold region. Our goal is to determine parameter prioritization during the calibration process for the purposes of reducing the dimensionality of the optimization problem for hydrological modeling in high-latitude, seasonal and cold region environment.

Study Area
NCRB is the third largest catchment in North America [32] with an approximate drainage area of 1.4 million km 2 ( Figure 1). It extends from the Rocky Mountains in the west to Lake Superior in the east, with elevation ranging from the sea level (at the outlets draining to Hudson Bay) to 3548 m above mean sea level in the Rocky Mountains. The Nelson and Churchill Rivers are the two major river systems in the basin, with respective catchment areas of 1.07 million km 2 and 0.28 million km 2 , draining the entire catchment into Hudson Bay. The NCRB region can be broadly classified into six freshwater basins: (i) Nelson River, (ii) Churchill River, (iii) Saskatchewan River, (iv) Red River, (v) Assiniboine River, and (vi) Winnipeg River. The Prairie region extending across the southern region of Alberta, Saskatchewan, and Manitoba is the driest part of NCRB, with an average annual precipitation of less than 320 mm. The Rocky Mountains and a small region in Ontario near Lake Superior are the wettest regions of the basin, receiving average annual precipitation of more than 750 mm. The annual average temperature of the catchment varies from +6 °C in the South to −6 °C in the North. During winter, the temperature varies from −4 °C in the South to −22 °C in the North, depending on the latitude. According to climate norms, the winter typically lasts from November to March in the southern latitude. The winter season is longer in the northern latitude of the basin, which generally lasts from October to May.
Precipitation is in the form of snowfall during winter ranging from 29 mm to 120 mm and is stored in the snowpack, which contributes to snowmelt during spring [33]. The water supply within NCRB varies greatly depending on the season, climatic conditions, latitude and topography. According to Manitoba Hydro [33], the Assiniboine River contributes the least to the NCRB water supply with an annual average streamflow of about ~45 m 3 /s between 1981 and 2010. Nelson River is the largest contributor with an annual average streamflow of ~3200 m 3 /s at the Hudson Bay. Lake Winnipeg, the largest lake system within NCRB, contributes an annual average streamflow of ~2180 m 3 /s through its outlet into the Nelson River. The Winnipeg River System, which drains into Lake Winnipeg, contributes an annual average streamflow of ~950 m 3 /s to NCRB. The Red river originating from the northern states, draining into Lake Winnipeg, averages the annual streamflow of ~250 m 3 /s. NCRB is characterized by numerous small and large lakes. The land cover of the region is mostly forest and wetland, accounting for nearly 70% of the catchment [32]. Great Plains and Prairies distinguish the southern region, whereas the northern region is predominately Boreal Shield. Cold region processes such as snowmelt, frozen soil and permafrost play an integral role in influencing the hydrology of the NCRB. The snowmelt from this region contributes to runoff supplying snowmelt water to lakes, reservoirs, rivers and wetlands [34]. The northern region of the catchment is mostly dominated by permafrost and frozen soil, which greatly influences the runoff with the formation of wetland due to continuous ponding [35][36][37]. Frozen soil has low infiltration capacity, which allows for the retention of snowmelt water on the surface. However, during the thawing period, the water rapidly infiltrates the unfrozen soil [38].

Hydrological Modeling
The HYPE model of NCRB is adapted from the Hudson Bay HYPE (H-HYPE) configuration described by Tefs et al. [39]. The HYPE model for the entire Arctic region (A-HYPE) is also available in Macdonald et al. [40]. NCRB is discretized into 2693 sub-basins with an average area of 535 km 2 and median area of 330 km 2 ( Figure S1). Sub-basins are further classified into homogeneous units called CLASSES in HYPE, which are formed by a unique combination of land cover, soil type and elevation. From the data source ( There are two types of lakes used in HYPE, referred to as ilake (local or inlet lakes) and olake (outlet lakes). The water level and threshold used in ilake and olake control the runoff in the local streams and the main river, respectively. HYPE uses ilake and olake parameters to control the outflow response of lakes, which is parameterized using rating curves. For the Hudson Bay Drainage Basin embedding the NCRB, Macdonald et al. [40] grouped ilake and olake in HYPE into four and three clusters respectively, based on physiographic characteristics using a k-means cluster analysis combined with a silhouette analysis for the selection of the optimal number of clusters. Table 1 documents the data products used for setting up HYPE for NCRB. Spatial maps such as the Digital Elevation Model (DEM) from United States Geological Survey (USGS), the land use map from the European Space Agency (ESA), and the soil map from the Harmonized World Soil Database (HWSD) are used to define the spatial characteristics of the NCRB in HYPE. The model is set up with an ensemble of three meteorological reanalysis products: (i) ECMWF Re-Analysis-Interim (ERA-I) (ii) North American Regional Reanalysis (NARR) and (iii) Hydro-GFD. The ensemble dataset is created by taking average of input variables from the three reanalysis products for individual sub-basins. Reanalysis data are the results of data assimilation from different sources such as ground observations, radar and satellite products into a numerical weather prediction model [42].

Evaluation Criteria
The sensitivity of the model parameters in VARS is determined by the model response values associated with all the sampled points. Streamflow generated by the HYPE model using ensemble precipitation and temperature input is evaluated against measured streamflow at multiple gauging stations across the NCRB using three standard error metrics: NSE [43], PBIAS [44], and Normalized Root Mean Square Error (NRMSE).
NSE (Equation 1) represents the model performance in estimating the timing and magnitude of peak flow periods, because it is more sensitive to large simulation errors that usually occur during these periods. NSE can range from negative infinity (the worst model performance) to a maximum value of one (the perfect fit between simulated and measured values). A naïve model that results in the long-term average of the measured data has = 0; therefore, a positive value for NSE is expected for a model that is adequately simulating the system performance.
where, and are, respectively, the measured and simulated streamflow (m 3 /s) at time step , and is the average of the measured streamflow over the time period . PBIAS measures the percentage difference between the simulated and measured data, putting an emphasis on the volumetric error in streamflow simulation (Equation (2)). An ideal model would have PBIAS of zero as both the simulated and measured streamflow would have the same mean values. However, a PBIAS value of zero does not necessary imply that the model is perfect because it is a measure of the volumetric error only. The absolute value of PBIAS increases as the difference between the simulated ( ) and measured ( ) streamflow widens: The NRMSE is normalized with respect to the difference between the maximum ( ) and minimum ( ) values of the measured streamflow (Equation (3)). This promotes fair comparison between different time series dataset with varying magnitude: In addition to the aforementioned model evaluation criteria, we have chosen three flow signatures to analyze the influence of the choice of error metrics on the sensitivity of the model parameters. The model performance in simulating the high flows is represented by the percentage difference between the measured and simulated 3-day averaged Q5 (the value of daily streamflow with a 5% exceedance probability) (Equation 4). The 3-day averaged Q95 (the value of daily discharge with a 95% exceedance probability) is used to evaluate the model performance in estimating low flows (Equation (5)). The 3-day averaged streamflow and 7-day averaged streamflow are selected to characterize the low flows and high flows in order to reduce the impact of measured flow uncertainty [45,46]: Finally, we have used the Slope of the Flow duration Curve (SFDC) to evaluate the response of the rising and falling limb of the hydrograph (indicated by the central part of the flow duration curve) to major hydrological events, compared to the real watershed system. SFDC is obtained as in Equation (6) [27] for the region bounded by the 30% and 70% probabilities of exceedance, as represented by the two center dotted lines in Figure 2. It is calculated for both the measured and simulated streamflow and their difference is used as an evaluation metric: where, Q30 is the value of daily discharge with a 30% exceedance probability; Q70 is the value of daily discharge with a 70% exceedance probability; = ∑ ( ) is the mean of daily streamflow (m 3 /s); T is the length of the streamflow data record (days); Q (t) is the streamflow (m 3 /s) at time t.

Sensitivity Analysis Using VARS
In this study, the sensitivity of 34 HYPE parameters is identified using a GSA technique called VARS ( Table 2). The parameter range used for SA has been set based on recommendations by Macdonald et al. [40] and other studies related to the HYPE model [47][48][49]. VARS is based on the directional variogram and covariogram of pairs of points in the response surface. The variogram is a function quantifying the variance of change in the response surface as a function of distance between samples in the parameter space. The covariogram function quantifies the covariance structure in the response surface as a function of a vector ℎ that is the distance vector between two points in the decision space. Details on the methodology and application of VARS can be found in Razavi and Gupta [13]. The HYPE model of the NCRB is run externally for all sample solutions generated by VARS based on STAR sampling. Ranking of the sensitivity parameters and indices are based on the Integrated VARS (IVARS). STAR sampling is carried out for each star center defined at certain interval along each dimensions of the parameter space. IVARS is the integration of the directional variogram with different parameter perturbation sizes ranging from 0.1 to 0.5, with an incremental value of 0.1. For example, IVARS10 refers to the integrated variogram with a small perturbation size of 10% of the parameter range, while IVARS50 is based on perturbation size equal to 50% of the range of each parameter. Although the investigation of the sensitivity is recommended with different factor ranges, Razavi and Gupta [50] recommend the use of IVARS50 as a global sensitivity metric if only a single metric is to be chosen. Furthermore, our study also shows that the rankings of parameters are not significantly influenced when comparing parameter rankings amongst IVARS10, IVARS30 and IVARS50 across six evaluation metrics (Table 3). Therefore, we have opted to use IVARS50 as the sensitivity metric for all our analysis.
In this study, SA is performed using three approaches corresponding to different time scale resolution: (i) long-term SA, (ii) monthly SA, and (iii) TVSA using a moving window analysis ( Figure  3). The long-term SA is based on sensitivity indices calculated from the error metrics computed from simulated and measured streamflow over the entire time period of analysis. In the case of monthly SA, sensitivity indices are determined separately for individual months, to identify the sensitivity based on seasonality. Compared to the previous two approaches, the TVSA calculates sensitivity indices for each parameter across each window period. For example, for a 30-day window, the sensitivity of the parameters is analyzed based on a moving window of every 30 days using the times series from model simulation. It should be noted that the majority of previous studies applying SA used method (i) [23,51,52], and in this study we compare the more traditional approach to methods (ii) and (iii) in a high-latitude, seasonal, and cold region environment. For all three SA approaches, we have used four sets of parameters (34 parameters) among which land use and soil type dependent parameters comprise the majority of parameters. Two other sets represent lake discharge and routing parameters. The long-term SA is based on aggregation of sensitivity information from the entire time-period of analysis. Here, we have analyzed the long-term SA for a period of 30 years from 1981-2010 to coincide with the Canadian climate normal period. The aggregated model response values in the form of evaluation metrics are calculated between simulated and measured streamflow for each set of parameters sampled. Parameter rankings are determined based on the ratio of factor sensitivity (RFS) of the parameters. RFS of a parameter is computed as the ratio of the IVARS for that parameter to the sum of IVARS over all parameters considered for SA. The parameter with the largest RFS value is assigned the highest rank (highly dominant), whereas the parameter with the smallest RFS values is assigned the lowest rank (least dominant). The reliability of the ranking of parameters for different evaluation metrics are also computed by VARS. Reliability is generally defined as "the ability of a system or components to function under stated condition for a specified period of time" [53]. In VARS, reliability is the probability of success to determine the true ranking of the parameters based on specific sensitivity indices (IVARS). The reliabilities of parameter rankings are higher in simple models, whereas complex models with a large number of parameters require a large number of STAR samples to attain the state of true ranking [54].

Monthly SA
The monthly SA is implemented for a period of 30 years similar to the long-term SA, except that the sensitivity information is aggregated at a monthly timescale. The evaluation metrics are calculated between measured and simulated streamflow for each aggregated month separately. The monthly SA has advantage over long-term SA because it is expected to identify parameters that are strongly influential during particular seasons but less influential during other seasons (e.g., snowmelt parameters in the summer). The importance of some parameters, if not all, during model calibration may therefore change based on the seasonality in the study region and the time-period of analysis. In such cases, calibration should include parameters that are more influential in particular seasons but do not have a high sensitivity index over the whole simulation period. Zhao et al. [55] calibrated a hydrological model for an alpine basin using the stratified calibration approach (SCA). In the SCA method, the calibration of model parameters is implemented by stratifying seasons, which refers to dividing the entire period into dry and wet seasons. Calibration of a hydrological model based on seasonality improves simulation of ET and surface flow during rainy periods [55].

Time Variant Sensitivity Analysis
Compared to the previous two methods in which the aggregation of sensitivity information is over a month or long term, the TVSA method is implemented using a certain window period of fixed duration to minimize the loss of sensitivity information. Sensitivity metrics are calculated using VARS for different moving window periods (30, 60, 90, 180 and 360 days) to account for the interannual dry and wet cycles in the basin ( Figure 4) across a six year time period spanning from 2000 to 2005. The selection of window size is dependent on factors such as quality of data, time period of analysis, and smoothness of the plot desired [56]. Hui et al. [25] used TVSA with window sizes of up to 4 days for hydrologic and sediment parameters. Smaller window sizes are recommended as it reduces the effect of time series aggregation and is therefore able to capture the temporal dynamics of the influential parameters [25,57]. Although some hydrologic processes could occur at shorter durations (<30 days), this analysis is restricted to a minimum 30-day window period to enable the calculation of quantiles for the flow duration curves. The restriction is further compounded by the computational demand of smaller window size for a continental scale catchment. The use of 30 days and 60 days for window size is useful in exploring the seasonal sensitivity, whereas the window size larger than 90 days helps to explore the sensitivity of the model to different parameters at a coarser time scale. The time period of analysis is restricted to only six years due to the very large computational demand required to implement TVSA for every moving window period over a long term for such a large basin. The runtime for a 180-day moving-window is ~3 days for a single evaluation metric (i7-6700 processor, 2.60 GHz). It has increased to 12 days for a single evaluation metric when the window size is reduced to 30 days, and 72 cumulative days for all six-evaluation metrics (for method (iii) only).
The pcolor function of MATLAB is utilized to interpolate the color transition between consecutive time periods and different window sizes for each sensitivity metric. The color transition obtained from interpolation can be jagged, particularly for larger window sizes. Smooth transitions can be achieved by using smaller window sizes [25], which are not implemented in this study due to their large computational demand.

Long-Term SA
We first analyze the ranking and reliability of HYPE model parameters from the long-term SA based on the sensitivity metrics obtained from IVAR50 for the six evaluation criteria used in this study ( Figure 5). Results show that the parameter rankings and their reliability change as the model evaluation metric changes. For all evaluation metrics, the parameters related to land use and soil characteristics are most influential followed by the ilake and olake parameters.
In terms of relative ranking of the parameters, the crop coefficient correction factor (kc_corr) associated with the calculation of ET is the most influential parameter. This is an obvious result, because the kc_corr parameter is applied as a correction factor to all land use types. The rankings of the kc parameters associated with individual land use type is closely followed by that of the global kc_corr parameter. However, these rankings are not stable across all evaluation metrics. The crop coefficient associated with forest (kc_forest) is more influential than the crop coefficient associated with open vegetation (kc_open) such as grassland and open shrub land. Several studies have found that ET losses from forested areas are higher than those from cropland and open vegetation [58,59]. Most of the frozen soil parameters do not exhibit significant sensitivity and are considered among the least influential parameters. The model shows strong sensitivity to the frozen soil parameter related with medium soil type (bfrozn_medium). Therefore, it is ranked among the top ten most influential parameters. The medium soil type alone covers approximately 49% within NRCB, which is almost equal to the coverage of all the other soil types combined. In terms of reliability of the rankings, for NSE, the reliability varies from 26% to 100%, with a median reliability of 69%. The median reliability values of the parameter rankings for both PBIAS and NRMSE criteria remains at 70%. For the 3-day averaged Q5, Q95, and SFDC, the median reliability values degrade to 64%, 65% and 44% respectively. The higher reliability values are more reliable and accurate, and the chances of reproducing the same result is higher under the same condition [60]. Therefore, measurements with higher reliability are preferred to the one with the lower reliability. In general, the conventional approach for long-term SA is informative in ranking the parameters based on the sensitivity metrics, and the screening of the important parameters is essential for further calibration of hydrological models. For conventional error metrics such as NSE, the least influential parameters which have negligible RFS (e.g., frozen soil parameters related to certain soil types, and some lake parameters) can be eliminated from model calibration, thus reducing the parameter space from the 34 initially selected parameters to a total of 23 parameters (i.e., ~32% reduction). Nevertheless, in this method, the ranking of parameters is based on the sensitivity information aggregated over the entire time period of analysis and is therefore unable to detect the seasonal sensitivity of the parameters, or the time varying nature of the parameters and subsequent hydrologic processes they are controlling. The accumulation of model residuals across the entire period of analysis causes a loss of information content for the model calibration. The model residuals are even larger if the residuals are squared prior to aggregation as in RMSE and NSE metrics [5]. To address this issue, it is important to disaggregate the time series before calculating the sensitivity metrics.

Monthly SA
The seasonal nature of some hydrological processes makes the corresponding parameters more influential in particular seasons. For example, the ET parameters (kc group) are expected to be more influential in warmer seasons (Figures 6 and 7). Moreover, the monthly SA gives insight into the model's sensitivity to seasonal parameters, which appears to be uninfluential from long-term SA results. The model is least sensitive to the field capacity parameters that are related to organic (wcfc_organic) and fine (wcfc_fine) soil types irrespective of the seasonality. With finer textures, clay soils and wetland peats have a high water retention capacity and significantly low horizontal and vertical hydraulic conductivity [61]. Therefore, wcfc_organic and wcfc_fine likely have an inherent time lag (greater than one month) associated with the hydrologic response time and result in a low sensitivity at a monthly scale. In contrast, across all evaluation metrics the field capacity parameter associated with medium soil textures (wcfc_medium) is highly influential and reveals varying sensitivity on a seasonal basis. The variation in the sensitivity of wcfc_medium parameter depends on the error metrics used but in general, the sensitivity is higher during the summer and spring seasons compared to the winter season. Since the sensitivity of the soil parameter is not only reflected in the spring season when the snowmelt process is activated, the sensitivity is a reflection of the rainfall runoff process more than the snowmelt seasonality.
A comparison between Figures 6 and 7 shows that signature-based metrics better identify parameters that are influential in a particular season (month). For example, the wp_corr parameter related to the wilting point is highly influential only during the months of February and March while using Q95 as an error metric. This suggests the wp_corr parameter is strongly affected by the snowmelt process, which is not necessarily intuitive. Likewise, the rivvel parameter controlling the velocity in river reaches and discharge is most influential during spring when snowmelt generates a sharp increase, and high variability, in streamflow. Therefore, rivvel based on NSE (Figure 6a) and 3-day averaged Q5 (Figure 7a) shows maximum sensitivity during spring compared to the other metrics. Snowmelt and frozen soil parameters, snow sublimation (fpsno_corr) and frozen soil (brozn_medium), are most influential during the spring freshet period. Frozen soil parameters are dependent on soil type and strongly influenced by the soil infiltration capacity. The infiltration capacity of a soil is significantly reduced during the frozen state. During spring, the voids between the soil particles occupied by frozen water start thawing, resulting in an acceleration of the infiltration rate in the soil [62,63]. The monthly SA implemented on 34 hydrological parameters shows variation in the sensitivity indices based on seasonality, unlike the long-term SA in which the sensitivity indices are aggregated over the entire time period. Due to the aggregation of sensitivity indices, the sensitivity values of the parameters in the long-term SA are assumed constant over the entire time period of analysis. In contrast, the monthly SA demonstrates that some parameters such as rivvel and fpsno_corr are highly influential during a particular season, which corresponds to the snowmelt period ( Figure 6). However, their sensitivity remains undetectable during other seasons. The ranking of the parameters also changes seasonally, and neglecting such parameters (and parameter sensitivity changes) can lead to an unrealistic representation of the model processes. Though the monthly SA is able to address the seasonality effect of the parameters, this method is still insufficient to address the issue of temporal aggregation. The conclusion drawn from the monthly SA relative to the long-term SA warrants further investigation of the model's sensitivity to influential parameters at a finer time scale (i.e. TVSA) to investigate parameters that may not exhibit seasonality but are important for a more realistic, physically representative model and simulation. Figures 8 and 9 show the graphical representation of 11 of the most influential parameters, using conventional metrics (NSE and PBIAS) as evaluation criteria. In these figures, the x-axis represents the time period in days (e.g., 01/01/2000 to 31/12/2005), and the y-axis represents the window period (days) in the logarithmic scale. The RFS varies from a minimum value of zero to a maximum of 0.6, depending upon the parameter, window size, time period and evaluation metric used. TVSA confirms that kc_corr is the most influential parameter throughout all six years of analysis. The NSEbased sensitivity factor for kc_corr (Figure 8a) is lower relative to those obtained using PBIAS ( Figure  9a) and NRMSE ( Figure S2a), indicating that the crop coefficient parameter associated with ET process is more sensitive to changes in the overall flow volume compared to changes in peak flow. These findings agree with the results from the previous two methods, however additional information regarding the relative sensitivity of the parameters is provided by the TVSA. For instance, as opposed to its normal seasonal sensitivity behavior demonstrated in monthly SA, the TVSA identifies the kc parameters to be the most influential during the winter season in some dry years (2000 and 2001/2002) as well as (Figure 8). One possible explanation for this seemingly contradictory finding is the relative insensitivity of soil moisture parameters prior to freeze up and over winter during drier years. During drier years, the model is relatively insensitive to wcfc and wp parameters that are related to the water holding capacity of soils at different depths, which is a function of soil properties. During smaller precipitation events and longer dry periods, only the upper-layer (shallow) soil moisture contents are affected [64]. As precipitation ceases (or is minimal) for longer durations, the soil moisture of the other (lower) layers becomes uninfluential for the water balance, and hence the kc parameters become relatively more influential during the driest period of the year compared with other parameters. Likewise, though rivvel identifies as an influential parameter by monthly SA, TVSA shows that the sensitivity of the model to this parameter varies from one year to another depending on the error metric used for evaluation. For instance, as compared to monthly SA using the NSE error metric, which identifies the rivvel parameter to be influential during only spring and peak flow seasons, TVSA identifies it to be influential only in certain years (2001, 2002 and 2005). The results are consistent with the highest peak flows in the measured streamflow record for the Nelson and Churchill River outlets. For instance, the measured streamflow at the Nelson outlet is observed to be maximum during the years of 2001 and 2005 with the values of 5631 m 3 /s and 7346 m 3 /s respectively. Likewise, maximum peak streamflow is observed at the Churchill outlet during the year 2005 (2990 m 3 /s) ( Figure S3). Moreover, the model's sensitivity to the routing parameter (rivvel) is identifiable only during the smaller moving window period (less than 90 days) (Figure 8k). The use of larger window sizes (greater than 90 days) suppresses the sensitivity of this parameter, as the high flow condition tends not to persist for long durations. Freshet events contributing to increasing discharge, and subsequent water velocity, typically occur within a maximum window period of four months (120 days approximately) [65].

Time Variant Sensitivity Analysis
These results demonstrate that TVSA is effective in identifying process-and event-based sensitivity of the parameters, which are inherent limitations of the previous SA approaches. In addition, TVSA is able to detect the sensitivity of some parameters not considered influential when using a long-term or monthly SA approach. For instance, by evaluating the aggregated monthly sensitivity based on 3day-Q95 in Figure 7b, it suggests that this metric is insensitive to routing parameter (rivvel) and lake parameter (olrratp1). Such parameters particularly bear high significance in terms of event-based calibration such as is done for flood forecasting. The runoff process in cold regions such as NCRB is governed by snowmelt during spring season, as the snowmelt alone accounts for more than 80% of annual local runoff in this region [66]. Hydrological models used for flood forecasting requires reliable simulation of high flows and runoff generated from snowmelt [67]. Therefore, it is essential to consider event based parameters contributing to high flows and snowmelt such as rivvel and fpsno parameters in model calibration.

Impact of the Choice of Evaluation Metrics
The results from long-term SA show a reduction in the parameter space ranging from 25% to 32% across all evaluation metrics (Table 3), with NSE having the largest reduction (32%), and SFDC having the lowest (25%). To analyze the percentage reduction in parameter range for monthly SA and TVSA, we have taken the maximum RFS values of each parameter from the time scale of analysis (Table 4). Our comparison shows reduction in the parameter space for monthly SA varying from 14% to 26%, with SFDC also showing associated with the smallest reduction. The conventional error metrics based SA have larger percentage reductions varying from 20% to 26%. Table S1 shows the six most influential parameters at the monthly analysis timescale based on different evaluation metrics. The overall crop coefficient factor (kc) associated with ET remains dominant across all evaluation metrics, regardless of the SA method used. Conversely, the model is most sensitive to parameters associated with lake discharge (olrratp1) and routing (rivvel) during the high flow months of May and June when using NSE, which emphasizes peak flows (Figure 6a).

Conclusion
We have applied three different SA methods (long-term SA, monthly SA and TVSA) based on differing levels of temporal aggregation and different error metrics to analyze the sensitivity of the model parameters in HYPE for cold region catchment (NCRB). When applied to a set of 34 model parameters, long-term SA results suggest that up to ~32% of model parameters are uninfluential, and therefore can be eliminated from model calibration. However, in addition to exclusion of some (seasonally) influential parameters, it provides limited information regarding the relative sensitivity of individual model parameters due to the information loss that occurs during data aggregation over a longer time period. In contrast, monthly SA can reveal shorter-term, seasonal variations in the sensitivity of the model parameters associated with seasonally variant hydrological processes. Parameters related to cold region processes such as frozen soil and snow sublimation are found to be highly influential during snowmelt seasons as revealed by the monthly SA. The snow and ice accumulated during the winter season melts during the spring season, which contributes to high volume runoff and flooding in cold region catchments [69,70]. Intuitively, snowmelt runoff in the cold regions of the northern hemisphere is very sensitive to any alternation in parameters related to temperature [71]. The application of flood forecasting plays an integral role in foreseeing such events to minimize the extent of damage caused by heavy runoff [72]. Therefore, such parameters should not be discarded from model calibration, despite their low influence in long-term simulation, because of the high flows and volumes associated with spring freshet in nival-dominated regimes. Robust calibration of parameters associated with cold regions is essential to reliably simulate the snowmelt runoff peak [62,73,74]. Furthermore, the monthly SA approach to model calibration actually makes the model parameter choice more physically relevant too. This is because the texture of soils is accounted for explicitly by breaking up the seasons and making the calibration better informed. The calibrated parameters then reflect the physical environment (and changes in it) more realistically.
Although a monthly SA can detect the sensitivity of additional parameters with a comparative parameter space reduction (14% to 28%), the level of data aggregation to a monthly time step is still an issue as the monthly sensitivity of the parameters are considered the same throughout the time period of analysis. To address this issue, we have carried out a TVSA at various window periods to account for the event-based sensitivity of the parameters. TVSA is able to detect strong sensitivity signals for seasonal parameters, which would have been screened out when sensitivity information is aggregated over a longer time period. Despite the larger computational demand, implementation of TVSA can be useful in model calibration by identifying event-based hydrological processes. This is particularly important for event-based calibration useful for flood forecasting in cold regions dominated by snowmelt runoff [67,75]. Furthermore, the choice of error metric has significant influence on the rankings and selection of parameter sensitivity. The application of SFDC is found to be most effective for TVSA and monthly SA, as the retention of influential parameters are higher compared to the long-term SA, despite having comparatively low reliability. We also found that SFDC consistently represented the parameter sensitivity throughout all time period, and across the dry, wet, and normal climate conditions.
In conclusion, TVSA is essential for complex, multi-dimensional models applied in seasonal and cold region environment to assist with the identification of highly influential parameters and parameter interactions prior to model calibration. Since the sensitivity of the parameters is strongly influenced by climate forcing such as air temperature and precipitation, we recommend the mix of wet and dry periods [55] while implementing a TVSA of hydrological model parameters. Regardless of the purpose of the model use, event based calibration is recommended for improved simulation of cold region hydrological processes. Finally, we also recommend the use of SFDC as an objective function for event based model calibration due to its capability to capture variation in the sensitivity of the parameters throughout the time period of analysis.
Supplementary Materials: The following are available online at www.mdpi.com/2073-4441/12/4/961/s1, Figure  S1: Delineation of Sub-basins in HYPE for Nelson Churchill River Basin, Figure S2: Ratio of factor sensitivity for moving windows of 30 days, 60 days, 90 days, 180 days and 360 days using NRMSE as the evaluation criteria from 2000-01-01 to 2005-11-29, Figure S3: Daily Average Annual Hydrograph (1981-2010) for simulated streamflow (best parameter set from sampling for Sensitivity Analysis) and observed streamflow for major river system within NCRB, Figure S4: Ratio of factor sensitivity for moving windows of 30 days, 60 days, 90 days, 180 days and 360 days using 3-day averaged Q5 (high flow) as the evaluation criteria from 2000-01-01 to 2005-11-29, Table S1: Ranking of six most influential parameters (monthly) using different evaluation metrics