The Role of Hydrological Signatures in Calibration of Conceptual Hydrological Model

: Determining an optimal calibration strategy for hydrological models is essential for a robust and accurate water balance assessment, in particular, for catchments with limited observed data. In the present study, the hydrological model Bilan was used to simulate hydrological balance for 20 catchments throughout the Czech Republic during the period 1981–2016. Calibration strategies utilizing observed runoff and estimated soil moisture time series were compared with those using only long-term statistics (signatures) of runoff and soil moisture as well as a combination of signatures and time series. Calibration strategies were evaluated considering the goodness-of-ﬁt, the bias in ﬂow duration curve and runoff signatures and uncertainty of the Bilan model. Results indicate that the expert calibration and calibration with observed runoff time series are, in general, preferred. On the other hand, we show that, in many cases, the extension of the calibration criteria to also include runoff or soil moisture signatures is beneﬁcial, particularly for decreasing the uncertainty in parameters of the hydrological model. Moreover, in many cases, ﬁtting the model with hydrological signatures only provides a comparable ﬁt to that of the calibration strategies employing runoff time series. catchments in the Czech Republic utilizing four different calibration strategies: (1) expert calibration, (2) standard automatic calibration, (3) the standard automatic calibration considering hydrological signatures together with runoff and soil moisture time series, and (4) hydrological signatures only. The objectives of this study are to (i) evaluate the performance of different calibration strategies, (ii) assess the added value of hydrological signatures and soil moisture estimates, and (iii) determine to what extent are the time series data necessary when modelling hydrological balance.


Introduction
Hydrological models are commonly employed to calculate the hydrological balance of a catchment using various calibration strategies (i.e., diverse objective criteria including various variables, different optimization algorithms, etc.). The applied calibration strategy affects the performance of the hydrological model. The widely used manual (expert) calibration of parameters is strongly influenced by the experience of the hydrologist; it is time-consuming and strongly affects the quality of the calibrated model [1]. The automatic calibration, on the other hand, is fast and the performance of the model simulations are explicitly linked to the parameter values within the optimization criteria. The automatic calibration of hydrological models typically uses observed runoff time-series to optimize the parameters. This is, however, not possible in catchments with limited observations, especially if gauged stations are not available. In addition, due to equifinality, models of similar (good) performance may result from models with very different parameter sets and therefore not simulate the physical processes properly.
In ungauged catchments, the water balance can be estimated using different methods, e.g., extrapolation of hydrological model parameters [2], the spatial proximity [3], estimation of the spatially distributed variables from soils and other geo-spatial datasets [4], the physical similarity [5],

Study Area and Data
The 20 considered catchments are located in the Czech Republic, where long-term mean precipitation for the 1981-2010 (climatological reference period for Czech Republic) period is 709.5 mm, mean annual temperature is 7.9 • C and mean runoff is 205.5 mm [26]. The selected catchments are shown in Figure 1, with the numbers referring to catchment IDs. The majority of the catchments is located in the northern part of the territory (235000-Ploučnice, 324000-Smědá, 006000-Labe, 306000-Stěnava, 031000-Bělá-Častolovice, 309000-Vidnávka, 313000-Bělá-Mikulovice, 266000-Opava, 354000-Moravská Sázava, the others extend into the central part (047000-Loučná, 361000-Trebůvka, 252000-Odra, 447000-Loučka) and southern part of the territory (179000-Radbuza, 153000-Skalice, 143000-Volyňka, 138000-Otava, 107000-Teplá Vltava). Only catchments with freely available data (in the time of preparation of this study) without significant anthropogenic influence were selected. For selected catchments, mean annual precipitation is 792.5 mm, mean temperature is 7.3 • C, average annual soil water storage is 997.1 mm and mean annual runoff is 318.8 mm. The catchment areas range from 348 to 932 km 2 , with a mean size of 454 km 2 . Monthly time series of temperature ( • C), precipitation (mm) and observed runoff (mm) were provided for each catchment by the Czech Hydrometeorological Institute and the soil moisture estimates (mm) by the Global Change Research Institute of the Czech Academy of Sciences. The soil moisture estimates are based on the simulation of the SoilClim model-a model for water balance and the hydric and thermic soil regime assessment [27].

Methods
The hydrological model Bilan was used for the assessment of water balance in 20 catchments ( Figure 1) considering four calibration strategies: expert calibration, standard automatic calibration, calibration considering runoff and soil moisture time series in combination with hydrological signatures, and calibration with hydrological signatures only. The resulting parameter sets were then evaluated with respect to: (i) goodness-of-fit between observed and simulated runoff (GOF, hydroGOF [28]), (ii) uncertainty of the Bilan model parameters and (BP) (iii) selected runoff and soil moisture signatures (RS). This section introduces the model, the calibrations strategies and the evaluation metrics.

Bilan Hydrological Model
The hydrological model Bilan [29,30] is a conceptual rainfall-runoff model that is used for water balance assessment in the Czech Republic. For partly or fully conceptual models, some parameters cannot be considered as physically measured (or measurable) quantities and thus have to be estimated on the basis of the available data and information [31]. The structure of the model is formed by a number of storage components and a set of their relationships based on basic principles of water balance as well as simple mathematical concepts such as linear reservoir. This structure is similar to a well-known hydrology model HBV (Hydrologiska Byråns Vattenbalansavdelning model) [32]. The water balance in model Bilan is described in three zones: on the ground, in the aeration zone, including vegetation cover, and in the groundwater [33].
The input variables are described in Table 1, in our case we used the input variable precipitation (P (mm)), air temperature (T ( • C)) and optional time-series-soil water (mm)). In the model are individual components divided as input data, water balance component, and resulting parameters. The monthly type used algorithms depend on the condition of the particular month. Used mean monthly temperature as well as in the daily type the model distinguish the winter and summer conditional. In the monthly regime the total runoff (RM (mm)) is calculated as a sum of direct runoff (DR), interflow (I), and baseflow (BS) [30].
The model is shown in Figure 2 displaying input data, simulated storages and fluxes. See [34][35][36][37] for further details. The parameters of the model are identified (calibrated) using shuffled complex evolution (SCE-UA), Ref. [38] in combination with the differential evolution (DE) method [39]. The algorithm is stochastic and therefore allows for assessment of the uncertainty in the model parameters by repeated calibration. The standard calibration involves minimization of the error in simulated runoff in comparison to observed runoff represented by the value of the selected objective function (OF). However, the model also allows for widening the OF to consider also time series of other variables (typically soil moisture or baseflow estimates) or even individual hydrological signatures such as mean or variance of runoff, indicators of extremes etc.

Calibration Strategies
The purpose of testing the four calibration strategies was to find such calibration setup that would minimize the bias in simulated water balance and the uncertainty in the estimated model parameters. In addition, while the first three calibration strategies (expert, standard automatic, time series with hydrological signatures) require time series of observed runoff, the fourth (calibration with hydrological signatures) can also be applied at ungauged catchments since the hydrological signatures can often be successfully interpolated from available data or estimated from general formulas.
The available time period (1981-2016) was split into calibration (1981-1998) and validation (1999-2016) period, the former being used for the identification of model parameters and the latter for the evaluation of model performance. Since the stochastic optimization algorithm implemented in the Bilan model allows for the assessment of parameter uncertainty, we fitted the model 15 times for all calibration strategies except for the expert calibration for which only the "best" parameter set was provided.

Expert Calibration
This strategy builds upon knowledge of the catchments and experience with hydrological modelling and is frequently applied in the case of studies for individual catchments over the Czech Republic. Typically, the expert constrains the optimization ranges of model parameters and then runs the optimization procedure. In this perspective, this calibration strategy does not always result in the best possible match between observed and simulated runoff, but at the same time, it ensures that all water balance components have reasonable values and respect physical conditions of the catchments. Therefore throughout the paper, we take these results as a reference. The parameter sets considered here were provided by experts from T. G. Masaryk Water Research Institute (developer of the Bilan model).

Standard Automatic Calibration
The standard automatic calibration, uses differential evolution to minimize the error between time series of observed and simulated runoff. The advantage of the automatic calibration is that it is faster than manual calibration and can be applied over large sets of catchments. It often also results in a better match between observed and simulated runoff than the manual calibration. The downside of the automatic calibration is that for some catchments the simulated water balance (and/or model parameters) may not be realistic resulting in, e.g., excessive ground water or soil water accumulation, unrealistic snow cover, etc.

Calibration with Hydrological Signatures
The last two calibration strategies involve hydrological signatures either in combination with runoff and/or soil moisture time series or as the only component of the objective function (OF). Model calibration was performed in 15 iterations. As the hydrological signatures, we used mean, standard deviation and interquartile range of runoff and soil moisture. The difference between the time series in the OFs was represented by mean percent bias, the match between the hydrological signatures by relative percent difference. The individual components of the OFs were summed to result in a single value. In this paper we considered 52 OFs as given in Table A1. More than a half of the OFs uses only signatures.
The OFs can be split into six groups: The specific combinations of variables, time series and signatures is clear from Table A1. The aim of the introduction of hydrological signatures and/or soil moisture time series is to constrain the uncertainty in model parameters experienced with the automatic calibration and to test whether reasonable runoff simulation can be obtained without observed runoff time series.

Model Evaluation
The performance of each calibrated parameter set was evaluated with respect to the results of the expert calibration. This means that we like to evaluate to what extent we are able to obtain results close to the expert calibration but with limited information (and without expert knowledge). The parameter sets were evaluated considering: (a) Goodness-of-fit expressed (GOF) as the root mean square error (RMSE; [40]) and Kling-Gupta efficiency (KGE; [41]) of the simulated runoff with respect to runoff simulated by the parameter set resulting from the expert calibration (further denoted as the expert simulation).
The RMSE is given by where y i is expert simulation for i-th case, x i the average of the expert simulation and N is the total number of simulated values. It was used as standard statistical metric, that gives a relatively high weight to large errors.
The KGE is calculated according to where s is numeric weight vector of length 3 (here with all elements equal to 1), which combines the Pearson product-moment correlation coefficient (r), the ratio between the standard deviations (α) and the ratio between the mean of the expert simulation and simulation calibrated with particular OF (β). (b) difference in the distribution of Bilan model parameters (BP) Spa (controlling soil depth) and Grd (controlling baseflow) between expert-calibrated parameters (see Section 3.2) and calibration with particular OF. (c) relative difference in mean and the 20th (Q20) and 80th (Q80) percentile of runoff and soil moisture from the expert simulation with respect to the same signatures from the simulation calibrated with particular OF.
The relative differences are preferred here over the absolute in order to allow for comparison between catchments. To assess the performance of different calibration strategies, we subsequently ranked the results of individual strategies, according to the criteria above, for each catchment. The calibration strategies with the overall best performance were further evaluated separately as well-denoted selected characteristics.

Results and Discussion
This section presents a detailed assessment of the calibration strategies with respect to the flow duration curve, goodness-of-fit (RMSE, KGE), uncertainty of Bilan model parameters (Spa, Grd) and runoff signatures (Q20, Q80), according to different calibration strategies.

Runoff Difference Probability Curve
The runoff difference probability curve (Figure 3) was considered for evaluation of calibration, which shows the probability of the relative difference of the modelled runoff and the observation. The low flow are underestimated for selected (95th percentile is −49% of runoff for calibration and −37% for validation) and non-selected (95th percentile is −77% of runoff for calibration and −75% for validation) calibration strategies. The expert calibration results in runoff that matches the observed data closely. For validation, the runoff from expert calibration is positively biased for low flows (95th percentile is 8% of runoff). The selected well-performing calibration strategies are approaching the expert calibration and observed data fairly well.

Goodness-Of-Fit
The results for different calibration strategies are compared to those of expert calibration. From the evaluation of goodness-of-fit (GOF), it is obvious that the automatic calibration provides slightly lower RMSE and improves KGE for calibration (RMSE was improved by 0.134 and the KGE by −0.097 on average) with respect to expert calibration.
For validation, the results are not much different. Obviously, introducing runoff and soil moisture signatures has negative impacts on the goodness-of-fit (GOF) metrics. This is logical since a similar metric to RMSE/KGE is used for the automatic calibration, while the calibration criterion for strategies including runoff/soil moisture signatures are more complex. Clearly, the soil moisture signatures alone are not able to provide a reasonable fit. On the other hand, the calibration strategies, including runoff or runoff and soil moisture signatures, performed reasonably for both calibration and validation. It is also obvious that strategies which consider time series data perform better than those based on signatures only. However, the signature-only strategies are also able to provide reasonable results. A quantitative comparison for all groups of objective functions (OFs), with respect to RMSE and KGE, is given in Table 2. We further compared the distribution of differences in validation of RMSE and KGE between the selected (best-performing) objective functions (OFs) and the rest (Figure 4). It is obvious that the selected OFs provide consistent improvement in both RMSE and KGE, while the rest of the OFs are often worsening the results. In addition, taking into an account the time-series-based OFs only (without signatures) leads to considerably worse results.

Uncertainty of Bilan Model Parameters
The relative error in fitted Spa and Grd parameters for groups of objective functions (OFs) with respect to expert calibration is given in Table 3. These two parameters are important for the characterization of hydrological balance of a catchment, representing the soil water retention (Spa parameter) and ground water response (Grd parameter).
For the Spa parameter, the best results are obtained with automatic standard calibration. Very similar results are achieved with group R2. The Spa parameter is clearly improved by runoff signatures as indicated in Table 3. Including soil moisture leads to worse results (see groups SW, SW2, RSW, RSW2). The Grd parameter describing the groundwater dynamics was reliably estimated in the R2, RSW, RSW2 and A groups. In this case, the soil moisture signatures have improved the Grd parameter but only in combination with runoff. The density of relative errors in Bilan Spa a Grd parameters with respect to expert calibration for all calibration strategies is given in Figure 5. Again, the distribution of the errors for the selected (best-performing) objective functions (OFs) is much more consistent then for the rest of the OFs. Particularly for the Spa parameter, it is obvious that the selected OFs avoid the attraction to different parameter values which is evident for the rest of OFs (see red area on the left of Figure 5).

Runoff Signatures
Lastly, we evaluated the performance of the calibration strategies with respect to low and high flows represented by the 20th (Q20) and 80th (Q80) percentile (Table 4). In the case of low flows, the performance of the model is clearly improved when runoff signatures are considered and even more so when this is done in combination with soil moisture signatures or time series. On the other hand, the standard automatic calibration ranked among the worst OFs. For the high flows in general, the differences is much less variable with automatic calibration being slightly better than the rest of the OFs. The difference in the behavior between Q20 and Q80 is also obvious from the density presented in Figure 6 where the best performing OFs lead to a much narrower error distribution than the rest of the OFs for Q20, while the difference is very small in the case of Q80.

Summary of OFs' Performance
The 52 OFs considered for calibration contain runoff (R, R2), soil moisture (SW, SW2), and both runoff and soil moisture (RSW, RSW2) as a time series or hydrological signatures. The OFs were assessed with respect to goodness-of-fit (GOF), uncertainty of Bilan model parameters (BP) and bias in runoff signatures (RS). To summarize the performance of different OFs, we ranked the OFs at each catchment according to GOF, BP and RS and checked which OFs appears most frequently. The set of those OFs is presented in Table 5. There are only four OFs included in the best-performance set for all three criteria: the standard automatic calibration, R2-mean-optim and R2-iqr-optim. It is clear that time series information is crucial for hydrological simulation. However, the results also indicate that the OFs including hydrological signatures may rank among the best with respect to parameters of the hydrological model and low and high flow statistics. Our results suggest a relatively good agreement between modelled and observed runoff, however, when very low (Q80 and Q95) and very high quantiles for (Q20) are used for model diagnosis [4,25] it turns out that low flows (Q95-Q100) are significantly underestimated in most model settings.
This study was performed on 20 catchments, which means that the estimate of hydrological signatures may not be as robust as studies that include more catchments with diverse water regime allowing for better description of the behaviour of individual parameters, hydrological signatures, or selected variables [42]. Ref. [42] also mentiones, that the hydrological signatures are typically more influenced by climatic and topographic indices than by the land cover, soil properties, and geology. Although we did not consider other than hydroclimatic factors our study confirmed importance of the climatic factors especially those related to soil moisture influencing in particular low flows and groundwater-related parameter (Grd) of the Bilan model. Similar to [17], we have shown that unconstrained calibrated model parameters are varying in wide range of implausible values and it is necessary to balance between automated model calibration with expert-knowledge and local system understanding strategy. In addition, the hydrological signatures, considerably narrow the range of parameter values and approach the expert-calibrated parameters well. Therefore they should be considered in the calibration and diagnostics of the model in particular when behavior of the extremes is of interest as was already suggested by [25].

Concluding Remarks
In the present paper, we assessed the performance of a conceptual runoff model (Bilan) calibrated using hydrological signatures based on long term runoff and soil moisture characteristics. The results of these strategies are compared to those of the standard automatic and expert calibration. The performance of tested combinations of runoff and soil moisture time series and signatures is evaluated with respect to goodness-of-fit (GOF) between simulated and observed runoff, uncertainty of the estimated Bilan model parameters (BP) and runoff signatures (RS) representing low and high flows.
The main findings can be summarized as follows: • The standard automatic calibration performs best for most of the evaluation criteria, except for low flows; • The objective functions (OFs) utilizing time series are always performing better than those based on signatures only; • It is however clear that the good performance of automatically calibrated models can be counterbalanced by poor representation of hydrological processes, important hydrological signatures and overall increasing uncertainty of model parameters. Therefore, evaluation metrics accounting for biases in hydrological processes representation and objective functions combining the bias in runoff time series with that of other runoff characteristics should be considered; • In the cases where the runoff time series are not available, it is possible to get sufficient fit even using signatures representing runoff mean and variability; • The role of the runoff and soil moisture signatures is significant, in particular for low flows and parameters of the hydrological model.
The study was performed in specific conditions of the Czech Republic with a single hydrological model and further research is needed to confirm the findings also in different hydroclimatic and physical conditions and hydrological models.

Funding:
This article has been prepared within the research project "Water for Prague" No. CZ.07.1.02/0.0/0.0/16_023/0000118 and "Analysis of adaptation measures to mitigate the impacts of climate change and urbanization on the water regime in the area of external Prague", No. CZ.07.1.02/0.0/0.0/16_040/0000380, which have been financed from public funds-the EU Operational Programme Prague-Growth Pole of the Czech Republic. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Conflicts of Interest:
The authors declare no conflict interest.

Abbreviations
The following abbreviations are used in this manuscript: GOF