Combining an R-Based Evolutionary Algorithm and Hydrological Model for Effective Parameter Calibration

The hydrological model assessment and development (hydromad) modeling package is an R-based package that can be applied to simulate hydrological models and optimize parameters. As the hydromad package is incompatible with hydrological models outside the package, the parameters of such models cannot be directly optimized. Hence, we proposed a method of optimizing the hydrological-model parameters by combining the executable (EXE) file of the hydrological model with the shuffled complex evolution (SCE) algorithm provided by the hydromad package. A physically based, spatially distributed, grid-based rainfall–runoff model (GRM) was employed. By calibrating the parameters of the GRM, the performance of the model was found to be reasonable. Thus, the hydromad can be used to optimize the hydrological-model parameters outside the package if the EXE file of the hydrological model is available. The time required to optimize the parameters depends on the type of event, even for the same catchment area.


Introduction
Theoretically, a physically based, distributed hydrological model can be applied to simulate runoffs by extracting the required values of the parameters from the characteristics of a catchment, without requiring parameter calibration. However, the parameters are generally optimized using optimization algorithms because of factors such as spatial scale problems due to the difference between the scale of the measurement data and the model grid scale, error in the observed data, and lack of data for estimating parameters that are difficult to measure [1][2][3][4][5]. Moreover, it is difficult to incorporate various optimization algorithms into the modeling package when the number of parameters in the distributed hydrological model to be optimized is considerable. Therefore, previous studies have optimized the parameters of a distributed hydrological model using the trial-and-error method in cases where the required optimization algorithms were not included in the modeling package [6][7][8][9][10]. However, parameter calibration using the trial-and-error method can be time consuming; moreover, the reliability of the results is affected [3].
The hydrological model assessment and development (hydromad) [11] is a widely used R-based hydrological modeling package [12][13][14][15][16][17][18][19][20][21]. The hydromad package can be used to simulate conceptual hydrological models, optimize parameters, and analyze uncertainties. It provides various parameter optimization techniques. However, because the existing hydromad package is incompatible with hydrological models outside the package, the parameters of such models cannot be automatically optimized. To address this issue, several studies have optimized the parameters of the hydrological models using only the source code of the optimization algorithms provided in the hydromad package [22][23][24][25][26]. Hydromad is an open-source software package, which can be downloaded from http://hydromad.catchment.org.
The aim of this paper is to demonstrate that hydromad package can be used as a general auto calibration tool for hydrologic models. In this study, we used the R-based shuffled complex evolution (SCE) [27,28] algorithm code (SCEoptim) and a connection code (fitBySCE) included in the hydromad package. We demonstrated that the parameters of the distributed hydrological model could be estimated in a relatively easy manner by modifying the fitBySCE function and linking the executable (EXE) file of a distributed hydrological model outside the package with the SCEoptim. We employed the grid-based rainfall-runoff model (GRM) [29] for illustrative purposes. If a hydrological model other than the GRM has an EXE file (e.g., HEC-HMS, VIC, and HBV model) that can be executed in console window mode without graphical user interface (GUI), the parameters of the model can be calibrated using the R language-based SCE algorithm shown in this study. The GRM is available for free download at https://github.com/floodmodel/GRM. The rest of this paper is organized as follows. Section 2 explains the SCE algorithm, the GRM used in this study, and the R-based interworking code. Section 3 introduces the catchments and the data inputted to the GRM. Section 4 presents the results and discussion. Finally, the conclusions of this study are given in Section 5.

Shuffled Complex Evolution Algorithm
The SCE algorithm is one of the most widely used algorithms for parameter optimization [30] and has been used in various studies [31][32][33][34]. The SCE algorithm was developed using four different methods: Simplex procedure [35], controlled random search [36], competitive evolution [37], and complex shuffling [27]. The parameter optimization process of the algorithm is briefly described as follows. First, the initial population is randomly selected within a given parameter range. In the second step, the selected population is divided into complexes. Here, the complexes represent communities having parents who can reproduce an offspring. In the third step, each complex is independently evolved using a competitive complex evolution strategy. In the fourth step, all evolved complexes are mixed into a single population. This mixing process signifies the exchange of information. Finally, steps 2-4 are repeated until the pre-selected convergence condition is satisfied.
In this study, the convergence condition is met either if the difference between the objective functions of the previous and current iterations is lower than 0.00001 or if the pre-selected maximum number of iterations (herein five) is reached. The objective function used for the parameter optimization is the Nash-Sutcliffe efficiency (NSE) coefficient [38].
where Q obs,i and Q sim,i are the observed and simulated runoffs at the ith time, respectively, and Q obs denotes the mean value of the observed runoffs. The NSE values range from −∞ to 1, where 1 implies that the simulated and observed runoffs are equal. The objective function is obtained as the square of the difference between the observed and simulated values; therefore, it focuses on fitting larger flows in a hydrograph [39]. Thus, it is appropriate for simulating storm events, such as the ones presented in this study. Note that the SCE algorithm can automatically optimize the parameters of the hydrological model using various objective functions such as root mean squared error, relative bias, NSE using square-root transformed data, and NSE using log transformed data (see "Calculate objective function value" in Step 4 of Box A1 in Appendix A for usage of other objective functions). However, the purpose of this paper is to link the R-based SCE algorithm with the EXE file of the hydrological model. Thus, although additional objective functions can be used, this study is based on the use of one objective function for illustrative purposes. In addition, users can refer to the R code presented in this study and use additional objective functions for their research purposes. In addition, we used correlation coefficient (CC) and normalized root mean square error (nRMSE) to investigate the fitness of simulated runoffs using the R-based SCE algorithm. Higher values of CC indicate more correlation between the observed and simulated runoffs, and lower values of nRMSE indicate less residual variance.
where Cov(Q obs , Q sim ) is the covariance of the observed and simulated runoffs, σ Q obs and σ Q sim are the standard deviation of the observed and simulated runoff, respectively, and Q obs,max and Q obs,min are maximum and minimum value of the observed flow, respectively.

Grid-Based Rainfall-Runoff Model
The GRM is a grid-based, physically distributed, hydrological model used for simulating rainfall-runoff events. A kinematic wave model is used for the runoff analysis, and the Green-Ampt model [40] is used to simulate infiltration, sub-surface runoff, and baseflow. In addition, the effects of river-flow control facilities, such as dams and reservoirs, can be analyzed. The detailed theory and functions of the GRM are given in the manual provided by Choi and Kim [41].
The GRM is provided in the form of an EXE file, which runs in the console window mode. To run the model, the GRM project file (.gmp) is used as the switch argument. The GRM project file contains the path and the name of the input file used for the runoff simulation, modeling settings, and model parameters. Therefore, to execute the GRM using multiple parameter sets, it is necessary to either (1) prepare GRM project files corresponding to each parameter set in advance, or (2) perform the runoff analysis while dynamically modifying the parameters contained in a GRM project file. In this study, method (2) was applied. Table 1 lists the parameters to be calibrated and the physically applicable range of each parameter in this study.  Figure 1 shows the schematic of the combination of the GRM and the SCE algorithm provided in the hydromad package and a brief description of the procedure for modifying the link code in R. The fitBySCE function in R in the hydromad package is used to link the SCEoptim function in R, i.e., the main code of the SCE algorithm, with the various hydrological models in the hydromad package. We modified the R code in the fitBySCE function to match the GRM with the SCEoptim function, as shown in Figure 1b and Box A1 in Appendix A. Steps 1-5 in Box A1 are identical to those shown in Figure 1b. The original codes for the fitBySCE and SCEoptim functions can be found on the hydromad website (http://hydromad.catchment.org).  Figure 1 shows the schematic of the combination of the GRM and the SCE algorithm provided in the hydromad package and a brief description of the procedure for modifying the link code in R. The fitBySCE function in R in the hydromad package is used to link the SCEoptim function in R, i.e., the main code of the SCE algorithm, with the various hydrological models in the hydromad package. We modified the R code in the fitBySCE function to match the GRM with the SCEoptim function, as shown in Figure 1b and Box A1 in Appendix A. Steps 1-5 in Box A1 are identical to those shown in Figure 1b. The original codes for the fitBySCE and SCEoptim functions can be found on the hydromad website (http://hydromad.catchment.org). The explanation of Box A1 in Appendix A, for the connection of GRM and SCEoptim function is as follows.

•
First, users download the hydromad package from the website (http://hydromad.catchment.org) and install it. Users then use the "library" function to load the hydromad package. After that, users read the observed flow time series data for parameter calibration using R. The SCE algorithm adjusts the parameter values so that the simulated flow time series is as close as possible to this observed flow time series. • Second, users set a range of parameters for the hydrological model using the R script in the second step. Since the SCE algorithm finds optimal parameter values within selected ranges of parameters, careful consideration should be given to whether the ranges of parameters are physically valid. Users can create a parameter list using the R script in the second step (see "parlist") and set the upper, lower and initial values of the parameters for parameter calibration. • Third, users can control the options of the SCE algorithm, such as the number of complexes and the maximum number of iterations. In this study, we used 20 number of complexes and 5 maximum number of iterations by considering the complexity of GRM. Users must select a function scale of −1 (see "control") to ensure that the parameters have the optimal values when the value of the objective function is maximum. Users can also use the trace option to check the parameter tracking process in real time (see "control$trace").

•
The fourth step is the step of linking the EXE file of the GRM with the R-based SCE algorithm. Users can perform this process by modifying the source code of the R-based SCE algorithm in the hydromad package. For a detailed description of this step, first set the initial best model ("bestModel" that is a model simulation) and best objective function values ("bestFunVal") The explanation of Box A1 in Appendix A, for the connection of GRM and SCEoptim function is as follows.

•
First, users download the hydromad package from the website (http://hydromad.catchment.org) and install it. Users then use the "library" function to load the hydromad package. After that, users read the observed flow time series data for parameter calibration using R. The SCE algorithm adjusts the parameter values so that the simulated flow time series is as close as possible to this observed flow time series. • Second, users set a range of parameters for the hydrological model using the R script in the second step. Since the SCE algorithm finds optimal parameter values within selected ranges of parameters, careful consideration should be given to whether the ranges of parameters are physically valid. Users can create a parameter list using the R script in the second step (see "parlist") and set the upper, lower and initial values of the parameters for parameter calibration. • Third, users can control the options of the SCE algorithm, such as the number of complexes and the maximum number of iterations. In this study, we used 20 number of complexes and 5 maximum number of iterations by considering the complexity of GRM. Users must select a function scale of −1 (see "control") to ensure that the parameters have the optimal values when the value of the objective function is maximum. Users can also use the trace option to check the parameter tracking process in real time (see "control$trace").

•
The fourth step is the step of linking the EXE file of the GRM with the R-based SCE algorithm. Users can perform this process by modifying the source code of the R-based SCE algorithm in the hydromad package. For a detailed description of this step, first set the initial best model ("bestModel" that is a model simulation) and best objective function values ("bestFunVal") using the R script (see "Set best model and best objective function values"). Then we modified the "do_sce" function in the hydromad package to update the GRM's 9 initial parameter values to the GRM project file (.gmp) (see "Update GRM parameters" and "Read '.gmp' file that contains information including parameter values and location of geophysical input data"). After that, we used the updated GRM parameters (ds500_E201107.gmp) and the EXE file of the GRM (GRM.exe) for GRM simulation by "system" function (see "Run GRM"). The result of GRM simulation is stored in "thisMod". Then, the observed flow time series (Q) and simulated flow time series by GRM (X) are used to calculate the objective function value ("r.squared" that is the NSE of this study) by "hmadstat" function in the hydromad package. The calculated objective function value is stored in "thisVal". If "thisMod" and "thisVal" are best values for each iteration, they are updated as "bestModel" and "bestFunVal", respectively. • Finally, the modified do_sce function, upper limit, lower limit, initial parameter values, and control option are put into the SCEoptim function and the optimal parameter values are estimated through the evolution process.
Note that the original function fitBySCE provided in the hydromad package is not used in this study, because we directly executed the script in terms of the modified fitBySCE function in R. The script in terms of the modified fitBySCE function (Box A1 in Appendix A) is represented as an example of an event.

Catchments
In this study, the Danseong and Seonsan catchments, which are major tributary catchments of the Nakdong River in South Korea, were selected for the calibration of the GRM, as shown in Figure 2. The area of the Danseong catchment is approximately 1710 km 2 . Most regions in the Danseong catchment have steep mountains (approximately 73%), and agricultural land is distributed near the mountain stream (approximately 21%). The area of the Seonsan catchment is approximately 980 km 2 , constituting approximately 65% of low mountainous area. Moreover, the farmland is relatively flat (approximately 28%) throughout the catchment, except in the mountainous regions. The Seonsan catchment is flatter than the Danseong catchment but is relatively closer to natural catchments and includes an urban area of approximately 4% of the catchment area.
Water 2018, 10, x FOR PEER REVIEW 5 of 14 using the R script (see "Set best model and best objective function values"). Then we modified the "do_sce" function in the hydromad package to update the GRM's 9 initial parameter values to the GRM project file (.gmp) (see "Update GRM parameters" and "Read '.gmp' file that contains information including parameter values and location of geophysical input data"). After that, we used the updated GRM parameters (ds500_E201107.gmp) and the EXE file of the GRM (GRM.exe) for GRM simulation by "system" function (see "Run GRM"). The result of GRM simulation is stored in "thisMod". Then, the observed flow time series (Q) and simulated flow time series by GRM (X) are used to calculate the objective function value ("r.squared" that is the NSE of this study) by "hmadstat" function in the hydromad package. The calculated objective function value is stored in "thisVal". If "thisMod" and "thisVal" are best values for each iteration, they are updated as "bestModel" and "bestFunVal", respectively. • Finally, the modified do_sce function, upper limit, lower limit, initial parameter values, and control option are put into the SCEoptim function and the optimal parameter values are estimated through the evolution process.
Note that the original function fitBySCE provided in the hydromad package is not used in this study, because we directly executed the script in terms of the modified fitBySCE function in R. The script in terms of the modified fitBySCE function (Box A1 in Appendix A) is represented as an example of an event.

Catchments
In this study, the Danseong and Seonsan catchments, which are major tributary catchments of the Nakdong River in South Korea, were selected for the calibration of the GRM, as shown in Figure 2. The area of the Danseong catchment is approximately 1710 km 2 . Most regions in the Danseong catchment have steep mountains (approximately 73%), and agricultural land is distributed near the mountain stream (approximately 21%). The area of the Seonsan catchment is approximately 980 km 2 , constituting approximately 65% of low mountainous area. Moreover, the farmland is relatively flat (approximately 28%) throughout the catchment, except in the mountainous regions. The Seonsan catchment is flatter than the Danseong catchment but is relatively closer to natural catchments and includes an urban area of approximately 4% of the catchment area.  Table 2 lists the input data of the GRM used in this study. The GRM uses spatial information in raster format. The hydrological terrain factors, such as the catchment, flow direction, flow accumulation, stream, and slope layers were extracted from a digital elevation model (DEM), which  Table 2 lists the input data of the GRM used in this study. The GRM uses spatial information in raster format. The hydrological terrain factors, such as the catchment, flow direction, flow accumulation, stream, and slope layers were extracted from a digital elevation model (DEM), which is provided by the National Geographic Information Institute (http://www.ngii.go.kr). The land cover and soil information (soil texture and soil depth) were obtained with the help of the major land cover map (Ministry of Environment, http://www.me.go.kr) and a detailed soil map (National Academy of Agricultural Science, http://www.naas.go.kr) of South Korea, respectively. In this study, the spatial data were constructed in ASCII raster format with a resolution of 500 m × 500 m, for the two studied catchments. Table 2. Geographical and hydrological input data for the GRM.

Data (ASCII Raster) Usage
DEM Used in drainage analysis and to obtain the raster data (catchment, flow direction, flow accumulation, stream, and slope) for the GRM Land cover map Used to obtain the land cover data for the GRM Detailed soil map Used to obtain the soil texture and soil depth data for the GRM Two storm events were selected for each study area ( Table 3). The average rainfall in each catchment was generated by the Thiessen Polygon method using the observed rainfall of the rainfall stations shown in Figure 2. These rainfall stations are operated by the Ministry of Land, Infrastructure and Transport (www.molit.go.kr). The time interval for the rainfall and flow data is 10 min.   Figure 3 shows the observed and simulated flows of the four storm events for parameter calibration in the Danseong and Seonsan catchments. Overall, the simulated flow is in good agreement with the observed flow. In the cases of the two events in Danseong, as shown in Figure 3a,b, the simulated flow is in very good agreement with the observed flow; however, the simulated value is slightly lower at the peak flow. In the cases of the two events in Seonsan, as shown in Figure 3c,d, the simulated flow peak showed an increased time delay relative to the observed flow. However, the peak flow values were similar, and the simulation performed using the GRM was relatively good, even for the rising and falling limbs of the hydrograph. Table 4 lists the simulation results for parameter calibration and validation periods. The split-sample test [42] was used to calibrate and validate the parameters. For example, the parameter calibration results for Event 1 of Danseong catchment are shown in Table 4 as the result of "Event 1" period and "Par. Event 1" parameter (0.994 NSE, 0.997 CC, and 0.021 nRMSE). The results of the parameter validation are the result of "Event 2" period and "Par. Event 1" parameter (0.831 NSE, 0.922 CC, and 0.124 nRMSE).

Results and Discussion
For the calibration period, the NSE values corresponding to the four events were quite high, approximately 0.96. In addition, the CCs were high, approximately 0.98, and the nRMSEs were low, approximately 0.05. For the validation period, all of the NSE values were satisfactory except for one event (0.688 NSE for "Event 1" period and "Par. Event 2" parameter). However, this event had a satisfactory CC value (0.841) and nRMSE value (0.158).   Table 5 lists the optimum parameter values for the case wherein the model is calibrated individually for each event. Thus, the parameter values for the four events are different. This implies that the parameters can be individually calibrated for each storm event. However, to estimate the optimum physical parameter values that can be used universally for multiple storm events in a single catchment, it is necessary to analyze the rainfall-runoff relationship by employing additional storm events and determining the general optimum parameter values using an objective judgment.    Table 5 lists the optimum parameter values for the case wherein the model is calibrated individually for each event. Thus, the parameter values for the four events are different. This implies that the parameters can be individually calibrated for each storm event. However, to estimate the optimum physical parameter values that can be used universally for multiple storm events in a single catchment, it is necessary to analyze the rainfall-runoff relationship by employing additional storm events and determining the general optimum parameter values using an objective judgment.  Table 6 lists the time and computational resources required for the parameter optimization in this study. Here, the time required (2 in Table 6) represents the computation time required for the optimization when the event period is converted to a 24 h format to compare the computation times of other events with the same reference. For example, in event 1 of Danseong, the event period is 40 h, and as the time required to optimize the parameters for this event is 8.4 h, the converted time is 5.1 h with the assumption that the event period is in the 24 h format. The time required to optimize the parameters for the Danseong catchment, which has a larger number of runoff analysis grids, was greater than that required to optimize the parameters for the Seonsan catchment. For the Seonsan catchment, the optimization times required for both the events were similar. However, for the Danseong catchment, the optimization time required for event 1 was greater than that required for event 2. This implies that it is more difficult to optimize the parameters for a relatively small flow event when using the SCE algorithm for the parameter optimization of the GRM. In summary, the time required to optimize the parameters depends on the type of event, even for the same catchment area.

Conclusions
In this paper, we propose a method of automatically estimating the parameters of a hydrological model using the EXE file of the hydrological model that can be executed in console window mode without GUI and the SCE algorithm based on the R language. The model parameters of the GRM were estimated using the SCE algorithm for illustrative purposes. Moreover, the performance of the model was found to be appropriate. Therefore, the proposed automatic parameter optimization method can be applied to estimate the optimal parameter values of hydrological models. The optimal parameter values of the other hydrological models can be estimated using the SCE method by modifying the R code to fit the selected hydrological model by referring to the method, shown in Figure 1 and Box A1 in Appendix A. In summary, the time required to optimize the parameters depends on the type of event, even for the same catchment area.

Optimizing the parameters of the GRM
## Optimize parameters by SCEoptim in Hydromad package, which is the main code of SCE algorithm ans <-SCEoptim(do_sce, initpars, lower = lower, upper = upper, control = control)