Calibrating a Hydrological Model by Stratifying Frozen Ground Types and Seasons in a Cold Alpine Basin

Frozen ground and precipitation seasonality may strongly affect hydrological processes in a cold alpine basin, but the calibration of a hydrological model rarely considers their impacts on model parameters, likely leading to considerable simulation biases. In this study, we conducted a case study in a typical alpine catchment, the Babao River basin, in Northwest China, using the distributed hydrology–soil–vegetation model (DHSVM), to investigate the impacts of frozen ground type and precipitation seasonality on model parameters. The sensitivity analysis identified seven sensitive parameters in the DHSVM, amid which soil model parameters are found sensitive to the frozen ground type and land cover/vegetation parameters sensitive to dry and wet seasons. A stratified calibration approach that considers the impacts on model parameters of frozen soil types and seasons was then proposed and implemented by the particle swarm optimization method. The results show that the proposed calibration approach can obviously improve simulation accuracy in modeling streamflow in the study basin. The seasonally stratified calibration has an advantage in controlling evapotranspiration and surface flow in rainy periods, while the spatially stratified calibration considering frozen soil type enhances the simulation of base flow. In a typical cold alpine area without sufficient measured parametric values, this approach can outperform conventional calibration approaches in providing more robust parameter values. The underestimation in the April streamflow also highlights the importance of improved physics in a hydrological model, without which the model calibration cannot fully compensate the gap.


Introduction
Hydrological models are simplified, conceptual representations of the real world water cycle [1], aiming to analyze, understand and explore the dynamics of hydrological processes. A model consists of various parameters that define the characteristics of the model. Based on the model parameters, hydrological models are classified into lumped and distributed models. While the former category uses lumped parameters for an entire basin and is usually difficult to realistically represent spatiotemporal heterogeneity within a catchment, the latter, generally physically based, takes into representing the basin characteristics [37,38]. Therefore, it motivates the investigation of including spatial and temporal stratifications into a calibration scheme, in which both the frozen ground types and precipitation seasonality are considered.
The objective of this study is to develop a new calibration scheme for hydrological modeling in cold alpine basins in which the model parameters are stratified by frozen ground type and season. The approach was evaluated in the Babao River Basin, located within the Qilian Mountains, northwest China, where a grid-based distributed hydrological model, distributed hydrology-soil-vegetation model (DHSVM), was applied. This paper is organized as follows. Section 2 elaborates the proposed stratified calibration scheme in parallel with the study area, data and the model settings. Section 3 presents the results from the calibration experiments and examines their impacts on the simulation accuracy. Section 4 includes a thorough discussion on the effectiveness and rationality of this proposed calibration scheme. The uncertainties and model inadequacy learned through those experiments are also discussed. Finally, the conclusions are drawn in Section 5.

DHSVM
The DHSVM is a fully distributed, physically based hydrological model to calculate the energy and water balance in a gridded basin, usually aligned with its digital elevation model (DEM). This model has been widely used in a wide spectrum of studies, such as hydrological forecasting [39,40], effects and responses to climate changes [41,42] or anthropogenic factors such as engineering construction and cultivated land use [43,44]. Instead of using semi-distributed spatial discretization such as sub-basin, HRU or slope unit, the DHSVM uses grid cells to accurately represent the surface buffer layer and soil conditions in the study basin.
The model has seven main modules: two-layer ET, two-layer ground snowpack, canopy snow interception and release, unsaturated soil moisture movement, saturated subsurface flow, overland flow and channel flow [45]. The mass balance equation in a spatial unit at any time step is: where ∆S io and ∆S iu are the changes of overstory and understory water storage, respectively; ∆S s1 , ∆S s2 . . . ∆S sn are the changes of each soil layer water storage, respectively; ∆W is the change of snow snowpack water content; P is precipitation falling within the current cell and the incoming flow from upslope; E io , E iu and E s are evaporation from overstory, understory and soil layer, respectively; E to and E tu are transpiration from overstory and understory, respectively; P 2 is the discharge from the current cell. The DHSVM has been successfully applied to some cold alpine basin [46] and two major defects were identified and then improved: (1) it tends to overestimate the actual ET in summers in a cold alpine basin; (2) it fails to accurately quantify the melting processes occurred in the snowpack. An empirical approach established specially for a cold alpine basin [47] was put in place to improve the calculation of actual ET in the model. The maximum snow albedo as a parameter in the snowpack albedo decay curve was set to 0.92 according to snow albedo measurement [48], which improves the model's performance in winter and early spring.

Parametric Sensitivity Analysis
The DHSVM requires look-up table parameters that are tied to the soil or land cover/vegetation types. In typical use, those parameters are invariant throughout the modeling lifetime. However, in a typical cold alpine basin, frozen ground types and soil hydrothermal properties vary in space. Permafrost usually occurs in high elevations with negative mean annual temperature and the SFG distributes in low elevations and downstream areas with warmer temperature. Therefore, soil and vegetation parameters in DHSVM are likely changed over space and seasons in a cold alpine basin. We designed a set of sensitivity analysis experiments to identify the most sensitive parameters when the modeling domain is stratified spatially by frozen ground types and temporally by dry and wet seasons. More specifically, the basin is spatially divided into the permafrost area (PF) and seasonally frozen area (SF) based on an existing frozen ground distribution map in the area and the simulation period is temporally separated into the dry season (October to next April) and wet season (May to September). Along with the entire basin (EB) and the entire simulation period, (both) cases, nine spatial and temporal combinations ( Figure 1) were created. The most sensitive parameters were then identified for each combination and then classified into frozen-ground-type sensitive and season sensitive parameters. seasons. More specifically, the basin is spatially divided into the permafrost area (PF) and seasonally frozen area (SF) based on an existing frozen ground distribution map in the area and the simulation period is temporally separated into the dry season (October to next April) and wet season (May to September). Along with the entire basin (EB) and the entire simulation period, (both) cases, nine spatial and temporal combinations ( Figure 1) were created. The most sensitive parameters were then identified for each combination and then classified into frozen-ground-type sensitive and season sensitive parameters. The eFAST method was used to perform the analyses for each combination and the mean streamflow is used as the variable for evaluating the sensitivity. The eFAST method provides quantitative parametric sensitivity indices based on conditional variances to rank the sensitivities of parameters. In eFAST, the total effect instead of the first order sensitivity indices is computed. The number of samples per search curve and a maximum number of Fourier coefficients were set to 65 and four in this study, respectively, according to a previous study [7].
The parameters of snowpack are not included in the sensitivity experiments despite they change seasonally. Snowpack parameters only work in dry seasons when snowfall takes place and the model physics already represents its seasonal roles very well. Therefore, there is no need to further divide them spatially or temporally. Before the sensitivity analysis, we optimized the snowpack parameters and kept them constant throughout those experiments.

Stratified Parameter Calibration
The calibration was performed on sensitive parameters identified by the sensitivity analysis. The calibration was first separately performed for spatial strata and then seasonal strata. As there may be several types of soil and land use/cover in the study area, the sensitive parameters in association with the soil and land use/cover types need to be calibrated for each of those types. For example, in the Babao River Basin, there are three soil types in both permafrost and SFG areas. Therefore, three sets of sensitive soil-related parameters are required to be optimized for the three soil types. The final optimal values are obtained by re-calibrating the parameters, which are provisionally optimized in the former calibration steps, to reach a best globally optimal for the entire spatial and temporal domain. The optimization is achieved by the PSO method. The PSO is a widely used computational optimization method by having a population of candidate solutions, dubbed particles, and moving these particles around in the search-space according to simple mathematical formulae over the particle's position and velocity [10]. Each particle's movement is influenced by its local best-known position and the best-known positions in the search space. Through the iteration of particle position, the best-known positions of the whole search-space will be updated. The PSO has the advantage of fewer iteration times comparing to other heuristics methods. The number of iteration and population The eFAST method was used to perform the analyses for each combination and the mean streamflow is used as the variable for evaluating the sensitivity. The eFAST method provides quantitative parametric sensitivity indices based on conditional variances to rank the sensitivities of parameters. In eFAST, the total effect instead of the first order sensitivity indices is computed. The number of samples per search curve and a maximum number of Fourier coefficients were set to 65 and four in this study, respectively, according to a previous study [7].
The parameters of snowpack are not included in the sensitivity experiments despite they change seasonally. Snowpack parameters only work in dry seasons when snowfall takes place and the model physics already represents its seasonal roles very well. Therefore, there is no need to further divide them spatially or temporally. Before the sensitivity analysis, we optimized the snowpack parameters and kept them constant throughout those experiments.

Stratified Parameter Calibration
The calibration was performed on sensitive parameters identified by the sensitivity analysis. The calibration was first separately performed for spatial strata and then seasonal strata. As there may be several types of soil and land use/cover in the study area, the sensitive parameters in association with the soil and land use/cover types need to be calibrated for each of those types. For example, in the Babao River Basin, there are three soil types in both permafrost and SFG areas. Therefore, three sets of sensitive soil-related parameters are required to be optimized for the three soil types. The final optimal values are obtained by re-calibrating the parameters, which are provisionally optimized in the former calibration steps, to reach a best globally optimal for the entire spatial and temporal domain. The optimization is achieved by the PSO method. The PSO is a widely used computational optimization method by having a population of candidate solutions, dubbed particles, and moving these particles around in the search-space according to simple mathematical formulae over the particle's position and velocity [10]. Each particle's movement is influenced by its local best-known position and the best-known positions in the search space. Through the iteration of particle position, the best-known positions of the whole search-space will be updated. The PSO has the advantage of fewer iteration times comparing to other heuristics methods. The number of iteration and population size were assigned to 100 and 20, respectively, following the advice from Reference [49]. The Nash-Sutcliffe efficiency (NSE) coefficient was used as the objective function in the PSO: where Q t ob and Q t si are observed discharge at the basin outlet and simulation in the time step t, T is the total number of time steps, and Q ob is the mean of observations in the entire period. The Kling-Gupta efficiency (KGE) [50] and root-mean-square error (RMSE) were also chosen to evaluate the performance: where r represents the correlation coefficient between the observed and simulated discharges, α is the ratio of the standard deviation of the observed discharge to that of simulated discharge, and β is the ratio of the mean observed discharge to the mean simulated discharge. The closer the NSE and KGE are to 1, the better the simulation performance is. Lower values in RMSE indicate less simulation error. The proposal stratified calibration scheme is composed of four steps: 1. Baseline calibration. It optimizes sensitive parameters identified in the EB/Both combination (Figure 1), in a general way to calibrate a model without stratification, using the model settings and meteorological drivers developed for the study area. The parameter ranges are set to the model default. The simulation results from this step provide a reference for performance evaluation and the provided parameter values are input to the next steps as initial values.

2.
Calibration with stratifying frozen ground types. The entire area is divided into the permafrost area and SFG area based on a permafrost distribution map in this area. We created two sets of frozen ground type sensitive parameters (mostly associated with soil type) screened from the sensitivity analysis, one set for the permafrost area and the other for the SFG area. Those parameters are then calibrated against observed discharges at the basin outlet. The rest parameters take the values obtained from the baseline step and do not participate in the calibration.

3.
Calibration with stratifying seasons. The entire period is divided into dry (October to next April in the Babao River Basin) and wet seasons (May to September). Two sets of season sensitive parameters (mostly associated with land cover/vegetation type) are created for the two seasons. It is worthy to note the lateral conductivity of soil shows low sensitivity when considering seasons, although it is regarded as insensitive in the baseline step. Therefore, lateral conductivity is calibrated together with season sensitive parameters, while the rest parameters take values from the baseline step. In this step, no frozen ground type stratification is considered.

4.
Calibration with full stratification. The provisional parametric values supply initial values to the PSO, which then finds globally best fitting values for the four spatiotemporal strata (i.e., the permafrost area and dry season, the permafrost area and wet season, the SFG area and dry season, and the SFG area and wet season) in this step. The streamflow records at the basin outlet sever as the target for best matching.
The calibration period covers at least one cycle of dry and wet alternation, which spans 1 October 2008 to 30 September 2009 in the case study. To further verify the applicability of optimized parameters, the parameter values obtained in Steps 1 and 4 are used to simulate a much longer period (such as 1 October 2004 to 30 September 2008 in the case study) than the calibration period, over which the simulated discharges from Steps 1 and 4 are verified against the observed records.
A control experiment (RanCal) was specially designed to manifest the importance of stratifying parameters by frozen ground type. In RanCal, the entire area is randomly divided into two sets of areas, for which two sets of soil parameters are created so that the model has the same number of variables as the model in Step 2 has. The extent proportion is confined equally to that between permafrost and the SFG in the study area. All data and parameters in RanCal keep identical to those in Step 2. By comparing the results from RanCal and Step 2, we can examine the real improvement made by stratifying frozen ground types in exclusion of the effects of introducing more parameters into the model. Moreover, four experiments were designed to assess the sole impacts induced by the spatial and temporal stratifications in Steps 2 and 3. The optimal values for the permafrost and SFG areas from Step 2 are experimentally applied to all cells of the entire basin, respectively, no matter what the actual frozen ground type they are. Likewise, the optimal parameters for dry and wet seasons obtained in Step 3 are purposely applied to the entire period, respectively. The results from the four experiments were then evaluated.

Study Area, Data and Model Settings
The Babao River, the mainstream in the Babao River Basin, has a total length of about 104.1 km. The basin, located in the Qilian Mountains, Northwest China, has an area of about 2448 km 2 and altitudinal variations of more than 2000 m with an average elevation of about 3500 m a.s.l. (Figure 2a). Annual precipitation is about 400 mm and 95% occurs in summer. Primary soil types include clay loam, sandy loam and loam, accounting for 2.5%, 39.5% and 58% of the total basin area, respectively ( Figure 2c). The Babao River Basin is a natural basin with few human activities; the primary land cover/use types are grass (78.4%) and bare soil (14.8%). Other types such as water body (3.6%), wetland (2.8%), farmland and mixed forest (0.4%) occupy a small proportion (Figure 2d), and therefore the parameters connected to those types do not attend the sensitivity analysis and calibration. The lower limit of permafrost occurrence is about 3600 m. The SFG exists in lower elevations. According to the map of frozen ground distribution on the Upper Heihe River basin, which fully encloses the Babao River Basin, the permafrost and SFG in the study basin account for 58% and 42% of total areas, respectively ( Figure 2b).
The DEM data for the study basin was a subset from the ASTER Global DEM [51], with an original resolution of 30 m. The maps of frozen ground type [52], soil type [53] and land use/cover type [54] reflecting the conditions of around 2010 in the study area were provided by the Cold and Arid Regions Science Data Center at Lanzhou, Northwest China (http://westdc.westgis.ac.cn) and were converted to the grid format. The grid data were then converted to a uniform 300 m model resolution. The observation station, marked Qilian Station in Figure 2a, is located at the outlet of the basin, providing 3 h meteorological records and daily runoff observations. We subset five-year records (2005-2009) to match the other data. The MicroMet approach [55] was employed to interpolate meteorological data into grid cells. MicroMet is an intermediate-complexity, quasi-physically based, meteorological model to produce high-resolution atmospheric data. It is able to provide high-quality meteorological data for mountainous areas by considering topographic impacts to climate variables

Sensitive Parameters
Following the workflow in Figure 1, the global sensitivity indices were calculated by eFAST for the nine combinations. Based on the indices, a total of seven sensitive parameters as listed in Table 1 were identified. Among them, three parameters, i.e., monthly albedo (Alb), monthly leaf area index (LAI, m 2 /m 2 ) and minimum resistance (MinR, s/m), are closely connected to land cover/vegetation, while four of them, i.e., field capacity (FC, m 3 /m 3 ), lateral conductivity (LC, m/s), bubbling pressure (BP, m) and exponential decrease (ED, m −1 ) controlling the decay rate of hydraulic conductivity with depth, are associated to soils.

Sensitive Parameters
Following the workflow in Figure 1, the global sensitivity indices were calculated by eFAST for the nine combinations. Based on the indices, a total of seven sensitive parameters as listed in Table 1 were identified. Among them, three parameters, i.e., monthly albedo (Alb), monthly leaf area index (LAI, m 2 /m 2 ) and minimum resistance (MinR, s/m), are closely connected to land cover/vegetation, while four of them, i.e., field capacity (FC, m 3 /m 3 ), lateral conductivity (LC, m/s), bubbling pressure (BP, m) and exponential decrease (ED, m −1 ) controlling the decay rate of hydraulic conductivity with depth, are associated to soils. The sensitivity values of those parameters are shown in Table 1, where the sensitivity values are grouped into three levels, i.e., low, medium and high sensitivity, marked by 15%, 20% and 30% grayscale background, respectively. FC, LC and BP have different sensitivity in the permafrost and SFG areas. FC and BP are more sensitive in the permafrost area than the SFG area, whereas LC is more sensitive in the SFG area than permafrost area. Those three parameters influence the simulation of base flow in the DHSVM. By contrast, the land cover/vegetation parameters show stable sensitivity despite the frozen ground type. It also implies no strong correlation exists between parameters associated with the land cover/vegetation type and the frozen ground type.
LAI and MinR are found more sensitive in the wet season than the dry season. They are highly related to surface conditions, affecting overland hydrological processes especially in rainfall periods. The sensitivity of the two soil related parameters, i.e., FC and LC, varies across seasons. However, given soil parameters are usually stable throughout the year, the detected temporal variations in sensitivity are likely induced by the difference of base flow proportion in the total runoff in dry and wet seasons. The base flow proportion presents seasonal variations because the total runoff changes over season, and the absolute volume of base flow, affected by the soil parameters, is maintained at a relatively steady level. According to a previous study [57], LC and FC usually have a strong spatial heterogeneity associated with the spatial distribution of soil intrinsic features (such as density, porosity and permeability) and do not show clear seasonal characteristics. For this reason, Both FC and LC were not considered as season sensitive parameters.
Alb and ED present constant sensitivity in all combinations. Alb is connected to surface conditions that have apparent changes in dry and wet seasons, and it was identified as a season sensitive parameter. Meanwhile, ED is a soil parameter controlling the decay of hydraulic conductivity, which is subject to change in the areas of different frozen ground types. ED was thus considered as a frozen ground type sensitive parameter.

Calibrated Parametric Values
The optimal values for the sensitive parameters obtained in the four steps are presented in Tables 2  and 3. Table 2 shows the calibrated values for the parameters relevant to soil types. It is found most parametric values of clay loam are largest and those of loam are smallest among the three soil types. Their values are largely dependent on soil types. Of the same soil type, the values of soil parameters are usually larger in the permafrost area (FroCal/Perm and FullCal/Perm) than in the SFG area (FroCal/SFG and Full/SFG). The differences between frozen ground types imply distinct hydrological characteristics in soils that have different frozen ground types. LC is found insensitive in the permafrost area as well as in the baseline calibration, however, it becomes sensitive when calibrating in the SFG area. FC for the loam is calibrated to take an invariant value of 0.18 across all steps because this value is already the lower boundary able to be assigned to the loam.  Table 3 shows the calibrated values of three seasonally sensitive parameters that are closely related to the land cover type. LAI and MinR appear not sensitive in the dry season (SeaCal/Dry and FullCal/Dry), whereas they become sensitive in the wet season. Alb seems sensitive in both seasons. It has a higher value in the dry season than in wet season in grass-covered areas, and vice versa in bare soils.

Simulation Results
The NSE, KGE and RMSE values measured from the simulated and observed daily streamflow at the Qilian station using calibrated parameters through the four steps of calibrations are presented in Table 4. Compared to the baseline results, all the three metrics show consistently improved performance in the calibration period with the calibrations enforcing spatial or temporal stratifications. The spatially stratifying scheme (FroCal) slightly exceeds the seasonally stratifying scheme (SeaCal) in all stats. The full stratification scheme has the best performance in terms of all metrics. The NSE is increased from 0.58 in the baseline to 0.69 in the full scheme, and the RMSE is reduced from 0.89 × 10 6 m 3 in the baseline to 0.77 × 10 6 m 3 in the full scheme. The daily streamflow time series simulated in the four steps are plotted in Figure 3. It shows overall good agreements of all simulations against the observations, indicating all simulations can capture broader characteristics of streamflow variations. Amidst them, the full scheme has the best simulation whereas the baseline has the worst performance. Meanwhile, all simulations suffer from common weakness such as incapability in reproducing the runoff around April and underestimating some peak flows in summer.  Compared with the results from the baseline and FroCal, the results from SeaCal, in which the model parameters are calibrated by stratifying precipitation seasonality, a present and obvious advantage in the simulation of rainy periods in wet seasons. It improves the NSE and KGE by 0.05 and 0.04, respectively, in comparison with the baseline. It means the adjustment of land cover parameters for separated seasons can enhance the simulation of runoff. While SeaCal occasionally overestimated some peak streamflow in September, it is apparently insufficient in calculating base flow in dry seasons, as suffered in the baseline. By contrast, the FroCal, which considers the frozen ground types in calibration, works much better than the baseline and SeaCal in dry seasons. The baseline and SeaCal results present less base flow in the dry season, which is ameliorated in the FroCal by adjusting soil parameters in stratified areas although slight overestimation still exists compared to the observed. The results with the FullCal (the calibration scheme with full stratification) sharing the joint advantages of FroCal and SeaCal present best consistency with the observed daily streamflow in both trend and magnitude, as indicated by the highest NSE, KGE and lowest RMSE.
The optimal parameters obtained from the baseline and FullCal were applied to a longer validation period, which lasts four years from 1 October 2004 to 30 September 2008. The time series and stats are displayed in Figure 4 and Table 5, respectively. While all stats become worse to some degree than in the calibration period, the FullCal results keep considerably better than the baseline by 0.12 and 0.08 higher in NSE and KGE, respectively, and by 0.11 × 10 6 m 3 lower in RMSE. The daily streamflow simulated with FullCal has the highest agreement with the observed records in both rainy periods and the periods with scare precipitation. However, both FullCal and baseline have a problem in modeling July and August of 2005 and 2007, when less streamflow has been modelled. The reason is two-folded. On one hand, the DHSVM uses a Penman-Monteith based approach to approximate the actual ET, and the modeling in the Babao River Basin shows overestimation in evaporation and consequent underestimation in the runoff during June to August, the period temperature reaches highest throughout a year. On the other hand, the observed streamflow time series seem to mismatch precipitation at some time and might have some observation errors there. Compared with the results from the baseline and FroCal, the results from SeaCal, in which the model parameters are calibrated by stratifying precipitation seasonality, a present and obvious advantage in the simulation of rainy periods in wet seasons. It improves the NSE and KGE by 0.05 and 0.04, respectively, in comparison with the baseline. It means the adjustment of land cover parameters for separated seasons can enhance the simulation of runoff. While SeaCal occasionally overestimated some peak streamflow in September, it is apparently insufficient in calculating base flow in dry seasons, as suffered in the baseline. By contrast, the FroCal, which considers the frozen ground types in calibration, works much better than the baseline and SeaCal in dry seasons. The baseline and SeaCal results present less base flow in the dry season, which is ameliorated in the FroCal by adjusting soil parameters in stratified areas although slight overestimation still exists compared to the observed. The results with the FullCal (the calibration scheme with full stratification) sharing the joint advantages of FroCal and SeaCal present best consistency with the observed daily streamflow in both trend and magnitude, as indicated by the highest NSE, KGE and lowest RMSE.
The optimal parameters obtained from the baseline and FullCal were applied to a longer validation period, which lasts four years from 1 October 2004 to 30 September 2008. The time series and stats are displayed in Figure 4 and Table 5, respectively. While all stats become worse to some degree than in the calibration period, the FullCal results keep considerably better than the baseline by 0.12 and 0.08 higher in NSE and KGE, respectively, and by 0.11 × 10 6 m 3 lower in RMSE. The daily streamflow simulated with FullCal has the highest agreement with the observed records in both rainy periods and the periods with scare precipitation. However, both FullCal and baseline have a problem in modeling July and August of 2005 and 2007, when less streamflow has been modelled. The reason is two-folded. On one hand, the DHSVM uses a Penman-Monteith based approach to approximate the actual ET, and the modeling in the Babao River Basin shows overestimation in evaporation and consequent underestimation in the runoff during June to August, the period temperature reaches highest throughout a year. On the other hand, the observed streamflow time series seem to mismatch precipitation at some time and might have some observation errors there.  As exhibited in Figures 3 and 4, the simulations, regardless of any calibration approach employed, consistently fail to reproduce streamflow peaks in around Aprils. It seems only base flow is simulated in those periods. The observed peak streamflow is produced by combinations of thawing active layer of frozen ground, snow melt and glacier melt in early spring when air temperature becomes positive ( Figure 5). This proportion of runoff accounted for about 8% of the total annual amount. The DHSVM is capable of simulating snow melt but lacks the model physics of representing freeze and thaw processes occurred within the active layer and glacier melt as well. The calibration experiments help to detect the inadequacy of the model physics.   As exhibited in Figures 3 and 4, the simulations, regardless of any calibration approach employed, consistently fail to reproduce streamflow peaks in around Aprils. It seems only base flow is simulated in those periods. The observed peak streamflow is produced by combinations of thawing active layer of frozen ground, snow melt and glacier melt in early spring when air temperature becomes positive ( Figure 5). This proportion of runoff accounted for about 8% of the total annual amount. The DHSVM is capable of simulating snow melt but lacks the model physics of representing freeze and thaw processes occurred within the active layer and glacier melt as well. The calibration experiments help to detect the inadequacy of the model physics.
is simulated in those periods. The observed peak streamflow is produced by combinations of thawing active layer of frozen ground, snow melt and glacier melt in early spring when air temperature becomes positive ( Figure 5). This proportion of runoff accounted for about 8% of the total annual amount. The DHSVM is capable of simulating snow melt but lacks the model physics of representing freeze and thaw processes occurred within the active layer and glacier melt as well. The calibration experiments help to detect the inadequacy of the model physics.

Impacts of Frozen Ground Types and Seasons
The parametric stratification introduces more parameters, or degrees of freedom, into the model, which mathematically bring more flexibility for calibrating the model. The optimal values for the randomly spatial stratification (RanCal) are listed in Table 6. Comparing to Table 2, where the parameter values in permafrost areas are consistently larger than those in the SFG areas, the changes of parametric values between the two randomly separated areas (R1 and R2 in Table 6) show no obvious patterns. It suggests the stratification into the permafrost and SFG types have significant meaning with respect to model parameters. Figure 6 shows the results from the RanCal, the baseline, and the FroCal that stratifies frozen ground types. Both RanCal and FroCal improve the agreements with the observed daily streamflow especially in the dry season in comparison with the baseline. It confirms the potential effects of the introduction of more parameters into a model. However, the RanCal suffers larger discrepancy in simulating the early recession limb than the FroCal, whereas the latter agrees much better in the same period. The measured NSE, KGE and RMSE of RanCal are 0.61, 0.78 and 0.86 × 10 6 m 3 , respectively, while those of FroCal are 0.65, 0.80 and 0.81 × 10 6 m 3 , respectively. The increased performance with enforcing a stratified calibration by frozen ground type than the controlled experiment is rooted in the existence of systematic distinction in physiographic characteristics between the permafrost and SFG types. Table 6. Calibrated values of soil-related parameters in the random spatial stratification experiment (RanCal). R1 and R2 represent two randomly separated areas. (RanCal). R1 and R2 represent two randomly separated areas.  Figure 7 shows the simulated daily streamflow in two hypothetic experiments that applying the parameters for the permafrost area to the entire basin and the parameters for the SFG areas as well, without considering seasonal stratification. The simulated results with the permafrost parameters are less than the SFG parameters in the dry season and low precipitation periods. It is likely caused by more soil moisture content simulated by the permafrost parameters than by the SFG parameters in the whole year ( Figure 8) and results in less base flow. Permafrost has lower hydraulic conductivity  Figure 7 shows the simulated daily streamflow in two hypothetic experiments that applying the parameters for the permafrost area to the entire basin and the parameters for the SFG areas as well, without considering seasonal stratification. The simulated results with the permafrost parameters are less than the SFG parameters in the dry season and low precipitation periods. It is likely caused by more soil moisture content simulated by the permafrost parameters than by the SFG parameters in the whole year ( Figure 8) and results in less base flow. Permafrost has lower hydraulic conductivity and higher field capacity than the SFG and holds more soil moisture content in the soil. As less base flow is generated in the permafrost area, the parameter of lateral conductivity (LC) becomes not important and insensitive in modeling the permafrost area (FroCal/Perm and FullCal/Perm) as given in Table 2. Nevertheless, in the SFG area, LC has an effective impact on the generation of base flow. In wet seasons, the results of the two experiments do not differ much but the simulated streamflow with the SFG parameters seems a bit higher than the permafrost parameters but with larger fluctuation. Between the precipitation events, low flows simulated with the permafrost parameters appear more realistic than with the SFG parameters. and higher field capacity than the SFG and holds more soil moisture content in the soil. As less base flow is generated in the permafrost area, the parameter of lateral conductivity (LC) becomes not important and insensitive in modeling the permafrost area (FroCal/Perm and FullCal/Perm) as given in Table 2. Nevertheless, in the SFG area, LC has an effective impact on the generation of base flow. In wet seasons, the results of the two experiments do not differ much but the simulated streamflow with the SFG parameters seems a bit higher than the permafrost parameters but with larger fluctuation. Between the precipitation events, low flows simulated with the permafrost parameters appear more realistic than with the SFG parameters.    In the hypothetical experiments for the season stratification (Figures 9 and 10), the parameters calibrated for dry seasons lead to less evaporation and more runoff than the parameters for wet seasons do. The two experiments present highly similar simulations in the dry season, both below the measured streamflow. They become distinct from each other in rainfall months. Overall, the parameters for dry seasons result in higher simulated streamflow in rainfall months than those for In the hypothetical experiments for the season stratification (Figures 9 and 10), the parameters calibrated for dry seasons lead to less evaporation and more runoff than the parameters for wet seasons do. The two experiments present highly similar simulations in the dry season, both below the measured streamflow. They become distinct from each other in rainfall months. Overall, the parameters for dry seasons result in higher simulated streamflow in rainfall months than those for wet seasons. In June to July, the simulations with the parameters for the dry season approach the observed well while those with the parameters for the wet season tend to underestimate the discharge. In May and August to October, however, the parameters for the wet season work better than the parameters for the dry season. The parameters for dry seasons lead to too much discharge. The reason might be connected to the ET physics in the DHVSM, in which water is excessively evaporated in June to July when using the parameters calibrated for the wet season ( Figure 10). wet seasons. In June to July, the simulations with the parameters for the dry season approach the observed well while those with the parameters for the wet season tend to underestimate the discharge. In May and August to October, however, the parameters for the wet season work better than the parameters for the dry season. The parameters for dry seasons lead to too much discharge. The reason might be connected to the ET physics in the DHVSM, in which water is excessively evaporated in June to July when using the parameters calibrated for the wet season ( Figure 10).

Discussion
We identified sensitive parameters in the DHSVM as frozen ground type sensitive and season sensitive parameters when simulating in a cold alpine basin. It shows considerable improvement with calibrated parameters by considering both frozen ground type and season stratification ( Figures  3 and 4). Calibrating with stratified frozen ground types will improve the base flow simulation in particular in dry seasons. In permafrost areas, impermeable frozen soil prevents liquid water from passing down and diminishes the downslope outflow. Such effects do not exist in the SFG areas. As revealed in the hypothetical experiments separately testing the permafrost and the SFG parameters

Discussion
We identified sensitive parameters in the DHSVM as frozen ground type sensitive and season sensitive parameters when simulating in a cold alpine basin. It shows considerable improvement with calibrated parameters by considering both frozen ground type and season stratification (Figures 3 and 4). Calibrating with stratified frozen ground types will improve the base flow simulation in particular in dry seasons. In permafrost areas, impermeable frozen soil prevents liquid water from passing down and diminishes the downslope outflow. Such effects do not exist in the SFG areas. As revealed in the hypothetical experiments separately testing the permafrost and the SFG parameters in the entire basin (Figures 7 and 8), the optimal parameters for permafrost areas will tend to hold more soil moisture and consequently reduce the base flow. It lays a theoretical basis for necessitating parameter calibration considering spatial stratification by frozen ground type [35,36]. To fully test the advantages of such stratification, it is recommended to purposively set up some discharge gauges in permafrost and SFG dominated sub-catchments. In this study, however, there are no such sub-catchment gauges and we instead validated the results at the final outlet of the study basin. The impacts are positive on the simulation results by stratifying parameters by frozen ground type, especially in dry seasons when the base flow plays a leading role. Meanwhile, calibrating in separated dry and wet seasons is likely to ameliorate the simulation of surface runoff in rainfall periods by mainly adapting the parameters in connection to the land cover types. The Babao River Basin is a natural basin with few human interventions. Grassland in the basin shows strong phenology, therefore, model parameters associated with the vegetation type will change across seasons. Furthermore, because most precipitation concentrates on the wet season, moisture conditions on the land surface also affect the model parameters relevant to land cover type. The joint effects are likely to disable a single set of parameters to represent the real conditions of land cover. The hypothetical experiments (Figures 9 and 10) demonstrate the differences caused by the parameters specific to dry seasons and to wet seasons.
Some existing studies have recognized the importance of temporal stratification in making hydrological model calibration more effective. Zhang et al. [15] calibrated the parameters in the SWAT model by separating dry and wet seasons and reported substantially increased accuracy of simulating daily runoff from an NSE of 0.24 to 0.66 in dry seasons in a cold alpine basin, and from 0.82 to 0.87 in the entire modeling period. However, the results still diverged from the observations in the several dry periods, implying some ill parameterization for the base flow, which is potentially linked to the soil properties of frozen ground underlying the study area. Kim and Lee [34] set up a multiple objective calibration approach by calibrating model parameters in four seasons and found it was insufficient in winter and spring when the superficial soils began freezing or thawing. The R square was found much lower in winter. In a hydrological model, some parameters change substantially over time and can be effectively optimized when considering the seasonal effects to parameters. Some parameters, however, may exhibit notable spatial patterns and can be optimized by purposive spatial stratification. The above-mentioned studies in some way overlook the impacts of different types of frozen ground in a cold alpine basin on the model parameters.
Usually, stratifying the model parameters in space or time will provide more flexibility to the parameter calibration in a hydrological model due to the increase in the number of parameters. While the above-mentioned previous studies have proved the effectiveness of considering stratified seasons, the effectiveness of stratifying frozen ground types for calibration is worthy to investigate. The observed impacts are a mixed result of both increasing numbers of parameters and the real effects of the stratification. The real effects can be identified by a comparative experiment consisting of a calibration scheme with stratified frozen ground type and another one with randomly stratified areas in which only increasing in parameter numbers matters. The results show increasing in parameter numbers is in favor of improving the simulation, although its improvement is less than stratifying frozen ground types. However, we found no apparent pattern in the optimized parametric values from the randomly stratified calibration, which provides little knowledge to understand the real world due to the lack of physics basis. The meaning of parameter calibration is not only to provide a more reasonable combination of parameters but also to study the interaction between the generalized physical processes and parameters [58]. In this sense, a scientifically sound spatial stratification, such as by frozen ground type in a typical cold alpine basin, could well interpret the responses of parameters to different phenomena as well as improve the simulation accuracy and is helpful to promote the understanding of the real world processes. Those physiographical distinctions between the frozen ground types can explain the variability of parameters across frozen ground types.
This study also underlines the importance of complete model physics. Even with a sophisticated calibration approach as presented, it is still impossible to perfectly simulate the streamflow in April. Those defects are related to the occurrence of freeze and thaw processes in the active layer in early spring when the surficial frozen ground begins to melt. Those physics are absent in the adopted version of DHSVM. The stratified calibration scheme could strengthen the impacts of the permafrost and the SFG in a cold alpine basin. They cannot completely replace the physical processes in the model. It urges the need of fitting in place a frozen ground module in the DHSVM to compute the freezing and thawing processes recurring with the oscillating soil temperature. Some efforts have been undertaken in the same basin to enhance the freezing and thawing cycle in models so the simulation in spring has been improved [31]. They suffered notable discrepancy in summer, which can be mitigated by undertaking an appropriate stratified calibration as proposed.
Possible error sources in this study include the quality of data such as meteorological drivers and relatively coarse spatial resolution. The gridded driving forces to DHSVM were interpolated from a single site, the Qilian station, located at the outlet of the Babao River Basin. Although the interpolation method, MicroMet, includes the impacts of topography on climatic variables, the gridded driving forces are subject to uncertainty. In addition, the observed daily streamflow at the Qilian station seems suspicious at some time that abrupt streamflow takes place without matching rainfall. The DHSVM usually requires spatial discretization at a high resolution; the input data in coarse resolutions such as land use/cover and soil maps, which are initially provided at about 1 km resolution, may also come with certain uncertainty to the results in this study.

Conclusions
This study proposes a spatially and temporally stratified calibration approach for hydrological modelling in cold alpine basins. This approach takes into account the distinct impacts on model parameters of frozen ground types and precipitation seasonality. A case study applying a distributed hydrological model, DHSVM, in a typical cold alpine basin located in Qilian Mountains, Northwest China, proves the necessity and effectiveness of employing the stratified calibration approach. The following conclusions have been reached:

1.
The proposed calibration approach by stratifying frozen ground types and seasons could considerably improve the simulation accuracy in cold alpine basins. In the case study, all metrics including the Nash-Sutcliffe coefficient, Kling-Gupta efficiency and root-mean-square error illustrate notable improvements in performance, when the model is calibrated by considering either frozen ground types, seasons, or both. The improvement is substantial when the calibration is stratified by both frozen ground type and season. The improvement keeps significant in a four-year validation period, which is independent of the calibration period.

2.
The calibration with stratifying frozen ground types is able to effectively adjust base flow in dry seasons and in the periods between precipitation events, while the calibration with stratifying seasons improves in controlling ET and surface flow in rainy periods. In a cold alpine basin where permafrost and the SFG exist, the simulation can be much enhanced by calibrating parameters in stratified frozen ground types. When strong precipitation seasonality is present in the study area, the calibration by considering seasonality will also be in favor of improving the accuracy.

3.
While the proposed calibration approach leads to better simulation results, it cannot fully compensate for the absence of key physics in the model. Owing to the inadequacy of representing freeze-thaw processes in the active layer in the DHSVM, the simulations cannot correctly reproduce runoffs around April even through the proposed calibration approach. It highlights the importance of improved physics in a hydrological model and the need for complete physics cannot be fully fulfilled by any advanced calibration approach.