E ﬃ cient Calibration of a Conceptual Hydrological Model Based on the Enhanced Gauss–Levenberg–Marquardt Procedure

: Various models were developed in the past to simulate di ﬀ erent hydrological processes. However, discrepancies between simulated and observed values are still signiﬁcant and pose a challenge to many researchers. Models contain many parameters that cannot be directly measured. The values of most of these parameters are determined in the calibration process conditioning the e ﬃ ciency of such models. This paper introduces the use of the enhanced Gauss–Levenberg–Marquardt (GLM) procedure in combination with the singular value decomposition (SVD) and Tikhonov regularization to improve the process of hydrological model calibration. The procedure is tested on a freely available hydrological model using a synthetic dataset. Based on several e ﬃ ciency measures, the GLM procedure, in combination with SVD and Tikhonov regularization, was found to provide e ﬃ cient model history matching and almost perfect parameter calibration. Moreover, by comparing the results of the proposed procedure with the results of global evolutionary calibration procedures, it was found that the only calibration using the combined GLM procedure gave a perfect ﬁt in low ﬂows. Last but not least, the noise in the calculation results with the combined GLM method was practically the same in either the calibration or validation procedure, suggesting that only computational noise remained in the results.


Introduction
Simulation of hydrological processes using computer models has a long and rich history [1,2]. Hydrological models are merely a mathematical approximation of a variety of dynamic processes in highly heterogeneous conditions in nature. A model's variables are spatially heterogeneously distributed, while we have limited datasets available. The same is true for data on how water moves over and below the surface [3][4][5][6]. Different hydrological processes dominate different river catchments due to natural conditions. Therefore, various models were developed to simulate these hydrological processes more or less successfully [1,2,7,8]. More than 60 different hydrological models are available for rainfall-runoff simulations [1]. Due to noise, the discrepancies between simulation results and measurements are still significant and challenging for many researchers [6, [9][10][11][12][13][14][15][16]. One of the reasons for the discrepancy between measurements and results of simulations can also be attributed to a lack of knowledge [17].
Discrepancies between observed data and simulated values, i.e., noise, can be produced by many sources. First, the data we use in the model alone is a source of the so-called primary noise resulting from measurements and observations [6, 10,[18][19][20][21][22][23][24][25]. Second, the source of noise is the structure of the reduce calibration noise and to improve the calibration efficiency, (3) compare the performance of the GLM procedure in combination with SVD and Tikhonov regularization with global evolutionary calibration procedures, and (4) accelerate the calibration process using parallel computing. Moreover, the reader can find a detailed description of the proposed procedure in Supplementary Materials.

Materials and Methods
In the section below, we present the Gauss-Levenberg-Marquardt procedure, singular value decomposition, and Tikhonov regularization, which is used in the inverse solutions of the equations of a deterministic conceptual model. In addition, the hydrological model and data are described. A methodology for model calibration and validation is presented. All procedures, as mentioned earlier, i.e., GLM, SVD, and Tikhonov regularization, are included in the PEST software [53]. Therefore, for this study, we used the PEST software tool, which is an industry-standard software package designed to evaluate parameters and analyze the uncertainties of complex environmental and other computer models [53]. Specifically, we used the PEST_HP calibration tool (HP means very parallelized), which is equal to PEST [56,57] but PEST_HP uses parallel computation, resulting in faster calculation performance.

Gauss-Levenberg-Marquardt Method
The Gauss-Levenberg-Marquardt method (GLM) is a blend of gradient descent and Gauss-Newton methods. In the literature, it is also referred to as the "Levenberg-Marquardt" algorithm (LMA or just LM), or the "damped least-squares" method (DLS) [58]. The parameter estimation is based on Equation (1). When the target function is far from being the smallest, the GLM operates by the steepest gradient method, but around the minimum, the GLM begins to act as a Gauss-Newton method. The Marquardt lambda in Equation (1) is a blending factor, which determines the mix between the gradient descent and the Gauss-Newton methods. λ is the so-called "Marquardt parameter," or synthetic "Marquardt lambda," named after Marquardt [58], who introduced this methodology. However, the use of this parameter was pioneered by Levenberg [59]. If the error is increasing, the function is far from its minimum and lambda should be increased approaching the gradient descent method. On the other hand, if the error is decreasing, the approximation is working well, so lambda is decreased. The parameter upgrade vector (u) is calculated according to the following equation [58,60]: where J-a Jacobean matrix, each element is a derivative of one particular observation for each parameter; W-a square diagonal matrix with squared weights attached to each observation; λ-Marquardt lambda or damping factor, which is adjusted during each iteration; r-vector of residuals which is minimized during calibration; I-an identity matrix; and T-the matrix transpose operator.
As it is apparent from Equation (1), the GLM algorithm requires the computation of the Jacobian matrix and matrix inversion at each iteration step as many times as there are adjustable parameters. Matrix inversion is a significant disadvantage. When the matrix is singular, GLM still performs due to the Marquardt lambda. A matrix often becomes singular when columns (or rows) are linearly dependent [61]. An enhanced version of GLM is implemented in PEST_HP to improve computing performance. Each lambda defines a direction in parameter space; lambdas are selected using an enhanced version of the algorithm employed by the traditional PEST [62].
The GLM algorithm is a standard method for solving nonlinear least-squares problems. The method needs to adjust the parameterized functions to the set of measured data by reducing the sum of the Appl. Sci. 2020, 10, 3841 4 of 20 squares of the differences between the measured and computational values [44,60,63]. The result of this procedure is not only an optimal set of parameters but also a detailed analysis of the sensitivity of the parameters [11], which tells us what the impact and importance of each parameter are on the modeling itself [64]. Thus, several sets of parameter values that meet the criteria of the objective function with the same degree of adaptation can be obtained [30].

Singular Value Decomposition
When solving systems of linear equations, in practice, we may encounter very sensitive systems. The singular value decomposition (SVD) is a tool that helps us understand why a particular linear equation system is very sensitive. SVD is a very powerful tool in numerical linear algebra. With the help of SVD, we can get practical solutions to some very sensitive systems [45]. When a large number of parameters are added to a model, some parameters can be expected to be insensitive and others to be highly correlated with other parameters. As a result, even though a parameter may be estimable, it does not mean that it is estimable. What is needed is an intelligent calibration tool; one that detects what can and cannot be inferred from the calibration dataset. Then we can estimate what can be estimated and leave out what cannot be estimated. This is done automatically, without users' intervention. More about the theory of singular value decomposition (SVD) and the tool is described [56,65].

Tikhonov Regularization
The process by which a useful solution is obtained for a very sensitive system is called regularization [45]. Tikhonov regularization was first presented by Tikhonov and Arsenin as a solution finding procedure for ill-posed problems, which is often the case in models with a large number of parameters [43,66]. Using more parameters than can be constrained uniquely by observations results in the formulation of an ill-posed inverse problem. The numerical solution of that problem must include the use of one or more regularization mechanisms to stabilize the numerical solution process and identify a unique solution. This topic is covered in-depth in articles [67,68]. Boundary conditions constrain the inverse process calculations, and the experience of the modeler is required for their determination.

CMA-ES
The covariance matrix adaptation evolution strategy (CMA-ES) is a global optimization scheme. The CMA-ES is a stochastic or randomized method for the parameter calibration of nonlinear, nonconvex functions. CMA-ES is a powerful and robust global optimization method [38][39][40]. CMA-ES does not require derivatives of model outputs to adjustable parameters for the implementation of its optimization algorithm. Thus, it can be employed where model outputs show "numerical granularity" due to model numerical solution instability, or where the model is highly nonlinear and/or the objective function surface shows local minima at various scales [69].

GAP
Genetic algorithm and Powell (GAP) optimization is another approach for model calibration. The GAP is a stochastic design method and works evolutionarily by selecting and recombining high-performing parameter sets with each other. The GAP algorithm consists of two steps [18,54]. First, optimized parameter sets are generated by an evolutionary mechanism of selection and recombination of a set of initial, randomly selected parameter sets (again within user-defined parameter boundaries). During the second step, parameter sets are fine-tuned using Powell's quadratically convergent method, as described in [70].

Hydrological Model and Data
The hydrological model HBV in the version of HBV-light software [71] was selected for the study because of the possibility of its direct connection with external calibration programs. The HBV model is a widely used conceptual precipitation-runoff model that simulates catchment runoff based on precipitation, temperature, and long-term potential evaporation at a daily time step [72]. The HBV model was developed in the early 1970s to assist hydropower operations by the Hydrology Department of the Swedish Meteorological and Hydrological Institute [31]. The model is partially distributed since it allows the river catchment to be divided into smaller subcatchment units. Each subcatchment can be further subdivided into smaller areas based on land use and altitude. The model includes computational procedures that describe hydrological processes: snow accumulation and melting, evapotranspiration assessment and soil moisture calculation, subsoil runoff, and water flow transformation in a riverbed [73].
A version of the HBV-light model developed at Uppsala University was used and further refined at various universities around the world. The latest version of HBV-light was coded in the VB6 programming language in VB.NET [54]. In principle, the HBV-light routines correspond to the HBV model. However, some simplifications were made. Additional information on the HBV model is described in the articles [31,54,73].
An analysis based on a synthetic model was selected for the study, where the calibration procedure is performed based on the calculated catchment runoffs and taking into account known parameters. The process for creating the ideal model is relatively simple. The calibrated model is calculated once with known parameters. Then, the observed flow measurement (Q obs ) is replaced by the simulated flow (Q sim ) results. The model knows the simulated flow at known parameters, which is equal to the observed flow. The synthetic model eliminates measurement noise, conceptual noise, and model design noise from the calibration process. Therefore, only the calibration noise can be analyzed in this way. When calibrating such a model, a perfect match of the measured discharges with the simulated discharges can be expected [18,47]. However, when calibrated under ideal conditions, it is often not possible to re-establish (history match or to restore) the unique values of all parameters. In the synthetic model, deviations from perfect parameter matching are due to the noise generated by a particular calibration procedure and some nonunique parameters. This problem can be at least partially solved by appropriately parameterizing the relevant model equations [74].
For the calibration of parameters, the GLM calibration method, together with SVD and Tikhonov regularization, was used. Also, other calibration procedures, namely CMA-ES and GAP, were applied, expecting to result in a perfect goodness-of-fit measure as well. We selected these two additional procedures because they are already included in the software used for this study. CMA-ES is included in the PEST tool package, whereas GAP is integrated into the HBV-light program for calibration purposes. The GAP algorithm produces reasonably good results in finding a global solution to the observed function [18].
The existing hydrological test case included in the HBV-light model was selected for the analysis. For the test, Exercise 1 "HBV-Land" catchment case [54,55] was used. This hydrological test model needs 16 parameters to calibrate ( Table 1). The model has more than enough parameters to test. The selected time step is one day in the period for ten years. The measured data on precipitation, temperature, and discharges are available for the period from 1 January 1981 to 31 December 1991. More details about HBV-Land catchment and the overall scenario of how to prepare and run a model are described in the article [54,55] in section A1 Exercise 1. Since one of the main aims of this study is to analyze the calibration noise, we chose a synthetic model of a test example of the freely available HBV-Land included in the HBV-light. Moreover, with 16 parameters, we can analyze in more detail the impact of all the calibration procedures presented in the paper (i.e., GLM, CMA-ES, and GAP) to the known parameter values. If the analyses had been made on a model that had been calibrated with real measured output data, it would have been difficult to distinguish between the different types of noise that impede the result [26,30,42,75]. Moreover, another objective in this article is to test how to set up calibration parameters in a PEST control file to get satisfying results. A calibration failure in a primary case would put to question the continuation with a more complex model.

Model Calibration and Validation
Firstly, the GLM calibration method was performed. The procedure is schematically presented in Figure 1. The hydrological model with input data for the period from 1 Januar 1981 to 31 December 1991 and with known model parameters shown in the second column in Table 2 was calculated. For initial conditions, the model is expected to "warm-up" from 1 Januar 1981 to 30 October 1981. For the model, a computed value for the period from 1 October 1981 to 31 December 1991 was assumed. The calibration procedure was performed by using PEST [57] and PEST_HP programs [76] running parallel on twelve logical processors on a single laptop PC. The procedure is step-by-step described in the Supplementary Materials. The calibration was repeated using the CMA-ES procedure and with the same control files settings as GLM ( Figure 1). calculated. For initial conditions, the model is expected to "warm-up" from 1 Januar 1981 to 30 October 1981. For the model, a computed value for the period from 1 October 1981 to 31 December 1991 was assumed. The calibration procedure was performed by using PEST [57] and PEST_HP programs [76] running parallel on twelve logical processors on a single laptop PC. The procedure is step-by-step described in the Supplementary Materials. The calibration was repeated using the CMA-ES procedure and with the same control files settings as GLM ( Figure 1).   PEST is a suite of different programs for preparation, analysis, and control of calibration procedures and requires some knowledge and experience of this system. PEST programs require three types of support files ( Figure 1): (1) parameter template files-one for each model input file that contains the parameters of variables that control the calibration; (2) instruction files-one file for each model output file from which the PEST must read the observed result numbers; and (3) one control file that provides the PEST with control parameters. The PEST control file includes the names of the relevant model input and output files, the operational mode like estimation or regularization, values of control variables, values of initial parameters, boundary ranges for each parameter, observed measurement values and weights. PEST runs Model, which executes the hydrological model and the necessary preand post-programs. The process is repeated until the criteria specified in the PEST control file are met. For more detailed procedures, see PEST instructions [62,69].
The same HBV-Land model, as for the GLM calibration method, was used to calibrate the HBV-light model by the GAP method. With the GAP evolutionary method, the procedure was repeated 100 times to run the model 5000 times, and 1000 times additionally with Powell local optimization [18,70]. When running the GAP procedure 100 times, 100 different parameter data sets are computed, and then the appropriate set regarding the selected goodness-of-fit criteria in Table 3 is used. Moreover, the GAP procedure was repeated to run 100 times, but with only 13 parameters (GAP 13 ). In this way, it was tested if the parameter values are closer to the initial ones than by using all 16 parameters (GAP 16 ), where SP, CFR, and CWH values listed in Table 1, were fixed to the perfect-fit values ( Table 2). A similar fixing of some of the parameters to avoid over-parameterization was done with CFR and CWH described in the article [18], where K0 and UZL parameters were not used in the calibration process with GAP.
For the initial values of the parameters, the values listed in the third column in Table 2 are taken into account. All calibration procedures, i.e., GLM, CMA-ES, and GAP, are run with the same initial and boundary conditions shown in Table 2.
During the calibration process, composite sensitivities were calculated and normalized for all observations. Normalized sensitivity shows how the uncertainty in the output of a model can be apportioned to different input parameters.
Model validation was performed using measurements of precipitation and temperature for the period from 1 January 2001 to 31 December 2011. Similar to the calibration procedure [54,55], the first nine months of the calculation are used to warm-up the model. Discharge data are simulated with known parameters from the calibration for the period from 1 October 2001 to 31 December 2011. This validation model is the same as in the calibration process, but with different input of precipitation and temperature data. Evapotranspiration data are the same as in the calibration period.
The efficiency of a model is commonly evaluated by the Nash-Sutcliffe efficiency (NSE) coefficient, which has been used as a standard for decades in testing calibration and model validation [29]. The quality of the calibrated model is shown in the validation process. The results of the computations are compared with sample data that were not used in the calibration. A successful calibration model is expected to achieve a similar NSE coefficient to that in the calibration when validating.
In addition to the Nash-Sutcliffe efficiency, other measures of the model calibration performance were taken into account in the analysis, namely the coefficient of determination, Kling-Gupta efficiency [77], efficiency for log (Q), flow-weighted efficiency, mean difference, efficiency for peak flows, efficiency at low flows, low-flow difference, and volume error ( Table 3). Equations of efficiency for peak flow, model efficiency (R eff ), and efficiency for low flow are the same as NSE coefficients, differing just in different data series periods. For a peak flow, a data point with the highest observed discharge (Q obs ) value that is at least three times the average Q obs within a time window of 15 days is used. Nevertheless, the most fundamental hydrological characteristic is the low flow, which is a seasonal phenomenon and an integral part of every stream [78]. Low flows were identified as challenging to calibrate in several studies [79,80]. For low flows, the arbitrary upper bound is given by the mean annual runoff, which is a mean value of the available flow time series of the annual flow. Q 70 flows within the range of 70%-time exceedance threshold were used as low flows in this study. The same threshold was selected in the low-flow study in a heterogeneous catchment in Slovenia [81]. For the HBV-Land catchment model, based on the calibration period data from 1 October 1981 to 31 December 1991, the calculated low-flow threshold Q 70 is 0.39 mm. Table 3. Coefficients for calibration quality assessments, goodness-of-fit measures according to [82].

Benchmark Description Definition 1 Value for Perfect-Fit Range
Coefficient of determination (R 2 ) The efficiency of low flows 1 − Usually, the coefficient of efficiency (R eff ) is used in the HBV model. R eff is the Nash-Sutcliffe efficiency coefficient (NSE) of a whole calculated period and can range from −∞ to 1. When NSE = 1, then we get a "perfect-fit value" and it means that Q sim (t) = Q obs (t). When NSE = 0, the simulation is as good or as poor as the constant-value prediction. On the other hand, NSE < 0 means an inferior fit. As can be seen from the equations in Table 3, all discharges are time-dependent. The NSE coefficient is sensitive to extreme values. Model relevance can be objectively accepted or rejected based on the NSE greater than the threshold selected by the user. The acceptable values depend on the use of the model and the practice of modeling. Results of calculations presented in this paper showed a good approximation of the perfect-fit values, i.e., NSE = 1.
Moreover, the parameter's relative deviation (%) was calculated by comparing the simulated parameter (P sim ) to perfect-fit parameter (P PF ) to standardize the various metrics and allow for a comparison of the parameters (Equation (2)).

Results and Discussion
Calibration results with the GLM calibration procedure are graphically presented in Figure 2. Results of efficiency measures in the synthetic model with perfect-fit parameter values are given in the second column of Table 4 with the label "Perfect-fit value." The third column in Table 4 shows the results of parameters from the calibration using the GLM algorithm with the use of SVD and Tikhonov regularization. For the GLM procedure, the run time was less than 1 min. Despite the same calculation data, slightly different results were obtained, which can be attributed to the computation noise. This noise was most likely caused by rounding the Q sim results in the calculation. The NSE of the computation is quite high, as shown in the third column in Table 4. Deviations merely appeared after the tenth decimal place in the NSE (NSE perfect-fit value = 0.9999998607; NSE GLM = 0.9999998605) and after the fifth decimal place in the mean differences benchmark, which can be almost equated with the perfect-fit value. The performance of calibration with CMA-ES, GAP13, and GAP16 is shown in columns 4-6 of Table 4. The overall comparison of NSE of all four procedures applied in this study is shown in Figure 3. Based on NSE, one can notice an excellent performance also of CMA-ES and GAP13, while GAP16 reached the lowest NSE.   The NSE of the computation is quite high, as shown in the third column in Table 4. Deviations merely appeared after the tenth decimal place in the NSE (NSE perfect-fit value = 0.9999998607; NSE GLM = 0.9999998605) and after the fifth decimal place in the mean differences benchmark, which can be almost equated with the perfect-fit value. The performance of calibration with CMA-ES, GAP 13 , and GAP 16 is shown in columns 4-6 of Table 4. The overall comparison of NSE of all four procedures applied in this study is shown in Figure 3. Based on NSE, one can notice an excellent performance also of CMA-ES and GAP 13 , while GAP 16 reached the lowest NSE. PEST_HP calculates parameter sensitivity during the calibration process itself. The normalized sensitivity in the second column of Table 5 shows how the uncertainty in the output of a model can be distributed to different parameters. The TT (temperature threshold) parameter is the most sensitive parameter with normalized sensitivity 1. In contrast, the least sensitive is CFR (refreezing coefficient) and CWH (water-holding capacity) parameters with normalized sensitivities of 0. The results of the calculated parameters with GLM, CMA-ES, and GAP are shown in columns 3-6 of Table 5. The results calibrated by the GLM procedure in the third column of Table 5 give a value close to the perfect-fit parameter value of Table 2. The differences occur only from the fourth to the fifth decimal place as a result of the calibration noise shown in column 3 of Table 5. The parameter values calibrated by the GAP procedure give relatively good results (GAP16, NSE = 0.9996) compared to the existing hydrological practice. However, they are significantly worse compared to GLM, as shown in Tables 5 and 6. The advantage of using the GLM with SVD and Tikhonov regularization in the calibration process is evident by comparing the values of individual relative deviations concerning the "perfect-fit" of the parameters in Table 6. PEST_HP calculates parameter sensitivity during the calibration process itself. The normalized sensitivity in the second column of Table 5 shows how the uncertainty in the output of a model can be distributed to different parameters. The TT (temperature threshold) parameter is the most sensitive parameter with normalized sensitivity 1. In contrast, the least sensitive is CFR (refreezing coefficient) and CWH (water-holding capacity) parameters with normalized sensitivities of 0. The results of the calculated parameters with GLM, CMA-ES, and GAP are shown in columns 3-6 of Table 5. The results calibrated by the GLM procedure in the third column of Table 5 give a value close to the perfect-fit parameter value of Table 2. The differences occur only from the fourth to the fifth decimal place as a result of the calibration noise shown in column 3 of Table 5. The parameter values calibrated by the GAP procedure give relatively good results (GAP 16 , NSE = 0.9996) compared to the existing hydrological practice. However, they are significantly worse compared to GLM, as shown in Tables 5 and 6. The advantage of using the GLM with SVD and Tikhonov regularization in the calibration process is evident by comparing the values of individual relative deviations concerning the "perfect-fit" of the parameters in Table 6.
In the regularization process (GLM), almost a perfect match between the true and the calibrated parameters is achieved, deviating by only 0.02% of CFR (refreezing coefficient) shown in the second column in Table 6. This small deviation (noise) is probably a consequence of the computation. Differences occur after the third decimal place. They can be explained by the equation for calculating refreezing meltwater (RM) defined as RM = CFR CFMAX (TT-T (t)) and containing three parameters and one variable [18,54,72,83]. If a parameter such as CFR has independent relationships with other parameters, it cannot be adequately calculated due to over-parameterisation. It is therefore advisable to determine the value of such a parameter based on the expertise of the modeler. As the catchment has no altitude zones, the occurrence of refreezing meltwater was also relatively rare in the observed period. The results of the calibration procedure with GLM are graphically presented in Figures 4 and 5. Appl. Sci. 2020, 10, x FOR PEER REVIEW 12 of 20 In the regularization process (GLM), almost a perfect match between the true and the calibrated parameters is achieved, deviating by only 0.02% of CFR (refreezing coefficient) shown in the second column in Table 6. This small deviation (noise) is probably a consequence of the computation. Differences occur after the third decimal place. They can be explained by the equation for calculating refreezing meltwater (RM) defined as RM = CFR CFMAX (TT-T (t)) and containing three parameters and one variable [18,54,72,83]. If a parameter such as CFR has independent relationships with other parameters, it cannot be adequately calculated due to over-parameterisation. It is therefore advisable to determine the value of such a parameter based on the expertise of the modeler. As the catchment has no altitude zones, the occurrence of refreezing meltwater was also relatively rare in the observed period. The results of the calibration procedure with GLM are graphically presented in Figures 4 and  5. On the other hand, parameter values obtained by calibration using the GAP algorithm differ significantly from the true values used for the calculation of the HBV-Land model (Figure 4, Figure  5, and Table 6). In the case of GAP16, 10 of 16 parameters deviate by more than 1%; SP, CFR, and CWH parameters deviate even by 12%, 10%, and 35%, respectively. Moreover, the total parameter deviation obtained by the GAP16 procedure is 82%. CMA-ES performed better than GAP but worse than GLM. The largest deviation (2.6%) in CMA-ES is again observed for the CFR parameter. The Appl. Sci. 2020, 10, x FOR PEER REVIEW 13 of 20 second-largest deviation (0.7%) is found for the CFMAX parameter. The total deviation of parameters in case of CMA-ES is 6%. When the number of parameters in a calibration set was reduced from 16 to 13, a more efficient GAP calibration with the total deviation of parameters of 6% resulted. One hundred optimized parameter sets resulted in a maximum, minimum, and median NSE equal to 0.999989, 0.980133, and 0.994431, respectively. For comparing GAP16 with other calibration methods, the parameter set obtained with maximum NSE was selected.
Many researchers found that low-flow calibration is quite problematic [79,80]. In this study, the only calibration using the GLM procedure gave a perfect fit in low flows (Table 4) without using the additional criteria to calibrate low-flow data series.
CMA-ES parameter relative deviation to history matching is close to GLM, as shown in the second column of Table 6. The calibration time can sometimes be significant. The run time depends on computer power, the operating system, and calibration process termination criteria. We need to look at this metric relatively and see how much one method is faster or slower than the other. For the selected case of the hydrological model calibration, the calculation using the parallel computing according to the GLM procedure takes 50 s (Figure 6), the CMA-ES parallel calculation takes one hour, and one GAP calculation takes two minutes with 6000 model runs. To achieve the results shown in Table 6, one hundred repetitions of GAP calculations take about three hours and ten minutes. On the other hand, parameter values obtained by calibration using the GAP algorithm differ significantly from the true values used for the calculation of the HBV-Land model (Figures 4 and 5, and Table 6). In the case of GAP 16 , 10 of 16 parameters deviate by more than 1%; SP, CFR, and CWH parameters deviate even by 12%, 10%, and 35%, respectively. Moreover, the total parameter deviation obtained by the GAP 16 procedure is 82%. CMA-ES performed better than GAP but worse than GLM. The largest deviation (2.6%) in CMA-ES is again observed for the CFR parameter. The second-largest deviation (0.7%) is found for the CFMAX parameter. The total deviation of parameters in case of CMA-ES is 6%.
When the number of parameters in a calibration set was reduced from 16 to 13, a more efficient GAP calibration with the total deviation of parameters of 6% resulted. One hundred optimized parameter sets resulted in a maximum, minimum, and median NSE equal to 0.999989, 0.980133, and 0.994431, respectively. For comparing GAP 16 with other calibration methods, the parameter set obtained with maximum NSE was selected.
Many researchers found that low-flow calibration is quite problematic [79,80]. In this study, the only calibration using the GLM procedure gave a perfect fit in low flows (Table 4) without using the additional criteria to calibrate low-flow data series.
CMA-ES parameter relative deviation to history matching is close to GLM, as shown in the second column of Table 6. The calibration time can sometimes be significant. The run time depends on computer power, the operating system, and calibration process termination criteria. We need to look at this metric relatively and see how much one method is faster or slower than the other. For the selected case of the hydrological model calibration, the calculation using the parallel computing according to the GLM procedure takes 50 s (Figure 6), the CMA-ES parallel calculation takes one hour, and one GAP calculation takes two minutes with 6000 model runs. To achieve the results shown in Table 6, one hundred repetitions of GAP calculations take about three hours and ten minutes.  [84], in contrast to stochastic evolutionary procedures (CMA-ES and GAP in this study). The minimum error variance in the objective function was obtained with the GLM procedure in all cases.
The validation results were similar to those of the calibration but slightly worse for all procedures. However, especially for models calibrated using GAP, the differences are relatively more considerable. Validation results with the GLM calibration procedure are graphically presented in Figure 7. For the model based on the GLM calibration, the NSE for the calibration and validation period has the same value NSE = 1 ( Table 7). For the model based on the GAP calibration parameters, NSE decreases from 0.9996 for the calibration period to 0.9983 for the validation period (Table 7).  The GLM procedure calculated the hydrological model 533 times with 14 iterations, and the CMA-ES ran the model 12,937 times with 925 iterations. For one hundred repetitions with the GAP procedure, 603,150 model runs of the model are required. Better efficiency of the GLM procedure regarding the number of required model calculations in comparison with the methods of evolutionary CMA-ES and GAP is hence more than evident. Moreover, the GLM calibration process in all cases calculated the same NSE value and values of calibrated parameters remained the same, regardless of the number of repetitions of the calibration process. The same repeating results can be attributed to the nonstochastic behavior in GLM calibration procedures [84], in contrast to stochastic evolutionary procedures (CMA-ES and GAP in this study). The minimum error variance in the objective function was obtained with the GLM procedure in all cases.
The validation results were similar to those of the calibration but slightly worse for all procedures. However, especially for models calibrated using GAP, the differences are relatively more considerable. Validation results with the GLM calibration procedure are graphically presented in Figure 7. For the model based on the GLM calibration, the NSE for the calibration and validation period has the same value NSE = 1 (Table 7). For the model based on the GAP calibration parameters, NSE decreases from 0.9996 for the calibration period to 0.9983 for the validation period (Table 7).
Brilly et al. [85], as part of the broader study, reported about using the proposed combined procedure, implemented in the PEST software and on a real catchment, i.e., the Savinja River catchment in Slovenia. The average NSE value for the model calibration of 21 subcatchments was 0.85, suggesting that the proposed methodology has great potential also for modeling (multiple) real river catchments. However, this is the first paper testing the use of the combined GLM procedure on a synthetic rainfall-runoff model, where all other types of noise, except calibration noise, are excluded from the model. A comparison of results of all four calibration approaches used in this study shows that calibration noise was excluded only with the combined GLM method.  Brilly et al. [85], as part of the broader study, reported about using the proposed combined procedure, implemented in the PEST software and on a real catchment, i.e., the Savinja River catchment in Slovenia. The average NSE value for the model calibration of 21 subcatchments was 0.85, suggesting that the proposed methodology has great potential also for modeling (multiple) real river catchments. However, this is the first paper testing the use of the combined GLM procedure on a synthetic rainfall-runoff model, where all other types of noise, except calibration noise, are excluded from the model. A comparison of results of all four calibration approaches used in this study shows that calibration noise was excluded only with the combined GLM method.

Conclusions
In this paper, the enhanced Gauss-Levenberg-Marquardt (GLM) procedure in combination with the singular value decomposition (SVD) and Tikhonov regularization is presented and applied to calibrate the conceptual hydrological model. The procedure was evaluated and compared with two other global calibration techniques (GAP and CMA-ES). Based on the results of this study, the following conclusions can be drawn regarding the presented combined used of GLM, SVD, and Tikhonov regularization: • The combined GLM procedure enabled an almost perfect match between true and calculated parameters. Only one parameter out of 16 showed a deviation of only 0.02%.
• By knowing the true model parameters, various calibration parameter settings can be tested. With the appropriate parameter settings of calibration, the calibration noise was reduced, and the calibration efficiency was improved.
• Only with the GLM procedure, the precise values of parameters on the HBV-Land model for both peak and low flows were obtained. In other words, only the combined GLM process was able to eliminate the calibration noise in the case of simulations based on the synthetic model runoff.

Conclusions
In this paper, the enhanced Gauss-Levenberg-Marquardt (GLM) procedure in combination with the singular value decomposition (SVD) and Tikhonov regularization is presented and applied to calibrate the conceptual hydrological model. The procedure was evaluated and compared with two other global calibration techniques (GAP and CMA-ES). Based on the results of this study, the following conclusions can be drawn regarding the presented combined used of GLM, SVD, and Tikhonov regularization:

•
The combined GLM procedure enabled an almost perfect match between true and calculated parameters. Only one parameter out of 16 showed a deviation of only 0.02%.

•
By knowing the true model parameters, various calibration parameter settings can be tested. With the appropriate parameter settings of calibration, the calibration noise was reduced, and the calibration efficiency was improved.

•
Only with the GLM procedure, the precise values of parameters on the HBV-Land model for both peak and low flows were obtained. In other words, only the combined GLM process was able to eliminate the calibration noise in the case of simulations based on the synthetic model runoff.

•
The run time of the combined GLM procedure was significantly shorter than those of the compared procedures. For the same case study, the combined GLM calibration procedure was approximately 60 and 200-times shorter than of CMA-ES and GAP, respectively.

•
The GLM calibration process in all cases calculated the same NSE value, and values of calibrated parameters remained the same, regardless of the number of repetitions of the calibration process. The same repeating results can be attributed to the nonstochastic behavior in GLM calibration procedures [84], in contrast to stochastic evolutionary procedures (CMA-ES and GAP in this study). The minimum error variance in the objective function was obtained with the GLM procedure in all cases.

•
The noise in the calculation results with the GLM method was practically the same in either calibrating or validating the procedure. In the modeling, therefore, only computational noise remained in the results.

•
The result of calibration with the GLM procedure, which would be stuck at the local minimum, was not detected in this study. The GLM procedure has proven to be very useful in solving linear inverse problems. By introducing Tikhonov regularization into an inverse solution, we successfully calculated unique parameters. With successive iterations, the GLM method achieved model calibration, regardless of the nonlinear relationship between model outputs and model parameters.

•
The calibration of the synthetic model gave an insight into the noise generated and the deficiencies in the design of the calibration process. However, the GLM calibration process itself requires expertise in hydrological modeling as well as in the calibration process. The lack of expertise is also the reason that the GLM process has not yet been widely implemented in practice to calibrate distributed hydrological models. For this reason, a Supplementary Materials section was prepared to bring the procedure closer to potential users.
Results of this study suggest that the proposed procedure (detailed explanation in the Supplementary Materials-A Quickstart Guide to HBV-Light Hydrological Model Calibration Using PEST) has a great potential to address many open challenges, such as calibration of multiple subcatchments at once instead of one to the other and from top to bottom.