Does the Complexity of Evapotranspiration and Hydrological Models Enhance Robustness?

: In this study, ﬁve hydrological models of increasing complexity and 12 Potential Evapotranspiration (PET) estimation methods of different data requirements were applied in order to assess their effect on model performance, optimized parameters, and robustness. The models were applied over a set of 10 catchments that are located in South Korea. The Shufﬂed Complex Evolution-University of Arizona (SCE-UA) algorithm was implemented to calibrate the hydrological models for each PET input while considering similar objective functions. The hydrological models’ performance was satisfactory for each PET input in the calibration and validation periods for all of the tested catchments. The ﬁve hydrological models’ performance were found to be insensitive to the 12 PET inputs because of the SCE-UA algorithm’s efﬁciency in optimizing model parameters. However, the ﬁve hydrological models’ parameters in charge of transforming the PET to actual evapotranspiration were sensitive and signiﬁcantly affected by the PET complexity. The values of the three statistical indicators also agreed with the computed model evaluation index values. Similarly, identical behavioral similarities and Dimensionless Bias were observed in all of the tested catchments. For the ﬁve hydrological models, lack of robustness and higher Dimensionless Bias were seen for high and low ﬂow as well as for the Hamon PET input. The results indicated that the complexity of the hydrological models’ structure and the PET estimation methods did not necessarily enhance model performance and robustness. The model performance and robustness were found to be mainly dependent on extreme hydrological conditions, including high and low ﬂow, rather than complexity; the simplest hydrological model and PET estimation method could perform better if reliable hydro-meteorological datasets are applied.


Introduction
Hydrological models are used to predict low flows, floods, and groundwater recharge, and to develop safe and sustainable water resource strategies [1]. As a result, understanding the water resources systems and reliable information play a vital role to achieve environmentally sound and sustainable water management [2][3][4]. Moreover, Horne [5] addressed the importance of upgrading water information systems (data series) to enhance sustainable water management. Similar findings were made by Chung et al. [6] in a study of robust water resource vulnerability assessment in a Korean watershed. However, the model output should be reliable and robust for applications in an operational context [7]. Perrin et al. [7] defined the term robustness of a model as "the ability to perform under as varied a set of hydrological conditions as possible", and the same definition is used hereafter. The benefits and limitations of using a parsimonious model over an inadequate (u), Relative Humidity (RH), and Solar Radiation (R S ) data depending on the level of complexity. The available hydro-meteorological data obtained from 10 catchments were used to set up and calibrate five hydrological models for 12 PET inputs. The Thiessen polygon method was used for estimating the mean areal precipitation. There exist several kinds of optimization methods for calibrating hydrological models and the Shuffled Complex Evolution-University of Arizona (SCE-UA) algorithm is often adopted by researchers due to its broad applications, efficiency, and robustness [49][50][51][52]. In this study, the SCE-UA algorithm was used to optimize the hydrological model parameters for each PET input by setting a similar objective function: NSE, LogNSE, KGE, and RSR. The detailed description of the 10 catchments hydrological characteristics, the dataset used, the methodology adopted, the result, and discussions are presented in the next sections.

Study Area
In this study, 10 catchments were selected, which are based on the availability of continuous streamflow data from different climate zones in South Korea. The selected catchments represent the major hydrological characteristics of the Korean Peninsula with fewer anthropogenic effects on flow over the modelling period. The catchments are located in unregulated and mountainous regions and the catchment areas vary from 8.5 to 6648 km 2 . The main hydrological characteristics of the catchments are concentrated in summer (June to September), contributing to 70% of the precipitation that leads to peak flow. However, the snowmelt effect is not significant in these regions due to the limited amount of winter precipitation (December to February). The location of the 10 catchments, major rivers, and streamflow gauging stations are shown in Figure 1. obtained from 10 catchments were used to set up and calibrate five hydrological models for 12 PET inputs. The Thiessen polygon method was used for estimating the mean areal precipitation. There exist several kinds of optimization methods for calibrating hydrological models and the Shuffled Complex Evolution-University of Arizona (SCE-UA) algorithm is often adopted by researchers due to its broad applications, efficiency, and robustness [49][50][51][52]. In this study, the SCE-UA algorithm was used to optimize the hydrological model parameters for each PET input by setting a similar objective function: NSE, LogNSE, KGE, and RSR. The detailed description of the 10 catchments hydrological characteristics, the dataset used, the methodology adopted, the result, and discussions are presented in the next sections.

Study Area
In this study, 10 catchments were selected, which are based on the availability of continuous streamflow data from different climate zones in South Korea. The selected catchments represent the major hydrological characteristics of the Korean Peninsula with fewer anthropogenic effects on flow over the modelling period. The catchments are located in unregulated and mountainous regions and the catchment areas vary from 8.5 to 6648 km 2 . The main hydrological characteristics of the catchments are concentrated in summer (June to September), contributing to 70% of the precipitation that leads to peak flow. However, the snowmelt effect is not significant in these regions due to the limited amount of winter precipitation (December to February). The location of the 10 catchments, major rivers, and streamflow gauging stations are shown in Figure 1.

Data
The dataset that is used in this study consisted of daily hydro-meteorological measurements with no missing records over the application period (2001-2012) for nine gauged catchments. However, for the Kyeongan catchment, the available observed streamflow data (1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008) were used. The observed streamflow data at the outlet of the catchment and the areal precipitation data generated using the Thiessen polygon method were acquired from the Water Resources Management Information System operated by the Ministry of Land, Infrastructure, and Transport of the Korean Government. The climate data used for estimating PET were provided by the Korea Meteorological Administration. The hydro-meteorological data used for the Seolmacheon experimental catchment are managed by the Korea Institute of Civil Engineering and Building Technology (KICT). The climate data obtained from each station to estimate PET include daily T ( • C), RH (%), u (m/s), and R S (MJ/m 2 d) estimated from the daily sunshine hours (n) while using the Ångström-Prescott equation [53]. The major hydrological characteristics and hydro-climate conditions of the catchments are summarized in Table 1. The mean annual PET was estimated using the Penman-Monteith method. The input data for the physically based hydrological model (CAT) were extracted from land use and soil data provided by the Korea National Geography Institute, Korea Rural Development Administration, and the Ministry of Environment. More than 60% of the ten tested catchments are covered by forest and the Kyeongan catchment is the only urbanized area (15%) being covered with minimum forest (60%) and agriculture (18%) areas. The dominant soil texture of the ten catchments are sandy loam, clay loam, silty clay, and loamy. The remaining areas are covered by different kinds of soil types. The dominant land use and the soil texture of the 10 catchments are shown in Figure 2.

PET Estimation Methods
Long-term estimation of Actual Evapotranspiration (AET) requires sophisticated techniques or eddy flux instrumentation, which is expensive, and it is more challenging than quantifying the magnitude of PET and other hydro-meteorological parameters [23,53]. Hence, hydrologists interested in applying conceptual rainfall-runoff models often give more emphasis to PET estimation. There are several PET estimation methods with various development philosophies and climate data needs. Xu and Singh [54] grouped the PET estimation methods into five categories: (i) combination, (ii) water budget, (iii) mass-transfer, (iv) radiation, and (v) temperature-based methods. In this study, 12 PET estimation methods with different levels of complexity were applied as inputs to five hydrological models. Guo

PET Estimation Methods
Long-term estimation of Actual Evapotranspiration (AET) requires sophisticated techniques or eddy flux instrumentation, which is expensive, and it is more challenging than quantifying the magnitude of PET and other hydro-meteorological parameters [23,53]. Hence, hydrologists interested in applying conceptual rainfall-runoff models often give more emphasis to PET estimation. There are several PET estimation methods with various development philosophies and climate data needs. Xu and Singh [54] grouped the PET estimation methods into five categories: (i) combination, (ii) water budget, (iii) mass-transfer, (iv) radiation, and (v) temperature-based methods. In this study, 12 PET estimation methods with different levels of complexity were applied as inputs to five hydrological models. Guo   The PET methods selected in this study were only intended to meet the research objectives and the detailed evaporation process, theory, history, and applications of the PET estimation methods have been intensively discussed in several hydrological studies [53][54][55][56]. The PET estimation methods used, including the equation and climate data required, are summarized in and the values were less than 10 mm/day, except for a few outlier values. The estimated daily PET input values for the 10 catchments are presented in Figure 3.
have been intensively discussed in several hydrological studies [53][54][55][56]. The PET estimation methods used, including the equation and climate data required, are summarized in Table A1 (Appendix A). The daily PET values of the nine catchments (Seolmacheon, Boryeong, Seomjingang, Yongdam, Juam, Imha, Andong, Soyanggang, and Chungju) were estimated using 12 year climate data (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012) and for the Kyeongan Catchment, 11 year climate data (1998-2008) were used. The Hamon PET estimation method underestimated the daily PET values in all catchments. However, the estimated mean annual PET was identical for the remaining methods. The daily PET values also showed identical patterns and the values were less than 10 mm/day, except for a few outlier values. The estimated daily PET input values for the 10 catchments are presented in Figure 3.

Overview of Hydrological Model Structure
In this study, the GR4J, SIMHYD, CAT, TANK, and SAC-SMA models of increasing complexity with identical input data, including daily precipitation (P) and 12 PET inputs were applied over 10 Korean catchments to assess their effect on model performance, robustness, and optimized parameters. The detailed description of the five hydrological models and the parameters used for calibration are presented in Figures A1-A5 (Appendix B). The RS MINERVE tool that was developed at the Research Center On Alpine Environment (CREALP) was used to set up the SAC-SMA model; currently, the RS MINERVE platform is providing flood predictions for the Rhone River Basin located in Switzerland [57]. The Catchment Hydrological Cycle Assessment Tool (CAT) Version 3 software, as developed at KICT, was used to set up the GR4J, SIMHYD, CAT, and TANK models [51][52][53].

GR4J
The GR4J model is a conceptual rainfall-runoff model with a parsimonious structure with only four parameters [15]. The rainfall-runoff process starts by separating the rainfall depth and PET to net rainfall or net evapotranspiration capacity depending on the interception storage capacity level. After the separation, the production store capacity, the routing reservoir, lag time, and the groundwater exchange amounts are computed based on the governing equations. Evapotranspiration in the GR4J model occurs only from the production store. The percolated water from the production store is transformed into two unit hydrographs. The routing store uses the input discharge from one of the unit hydrographs, after that, the routing reservoir is emptied by a routing

Overview of Hydrological Model Structure
In this study, the GR4J, SIMHYD, CAT, TANK, and SAC-SMA models of increasing complexity with identical input data, including daily precipitation (P) and 12 PET inputs were applied over 10 Korean catchments to assess their effect on model performance, robustness, and optimized parameters. The detailed description of the five hydrological models and the parameters used for calibration are presented in Figures A1-A5 (Appendix B). The RS MINERVE tool that was developed at the Research Center On Alpine Environment (CREALP) was used to set up the SAC-SMA model; currently, the RS MINERVE platform is providing flood predictions for the Rhone River Basin located in Switzerland [57]. The Catchment Hydrological Cycle Assessment Tool (CAT) Version 3 software, as developed at KICT, was used to set up the GR4J, SIMHYD, CAT, and TANK models [51][52][53].

GR4J
The GR4J model is a conceptual rainfall-runoff model with a parsimonious structure with only four parameters [15]. The rainfall-runoff process starts by separating the rainfall depth and PET to net rainfall or net evapotranspiration capacity depending on the interception storage capacity level. After the separation, the production store capacity, the routing reservoir, lag time, and the groundwater exchange amounts are computed based on the governing equations. Evapotranspiration in the GR4J model occurs only from the production store. The percolated water from the production store is transformed into two unit hydrographs. The routing store uses the input discharge from one of the unit hydrographs, after that, the routing reservoir is emptied by a routing discharge. The groundwater exchange only occurs from the routing reservoir. The model has been tested intensively in several basins and climatic conditions, moreover, there exists several hydrological studies [7,[58][59][60][61][62]. The feasible ranges of the four parameters used to calibrate the model for each PET input in the tested catchments were adopted from previous works [15,58,63]. The detailed description of the GR4J model and the parameters that were used for calibration is presented in Figure A1 (Appendix B).

SIMHYD
The SIMHYD model is a daily conceptual rainfall-runoff model adopted from a similar model developed by Porter and McMahon [64]. The SIMHYD model used in this study had seven parameters to simulate the rainfall-runoff response for each PET input [65,66]. The rainfall-runoff conceptualization considers the runoff from impervious areas and infiltration excess modelled as a nonlinear function of soil moisture. The remaining two-process, interflow and base flow from the groundwater store, are modelled when considering linear relationships. In the SIMHYD model, daily evaporation starts from the interception store depending on the availability of water. The model was tested in Australia, the United States, and France for different applications [66][67][68][69][70]. Furthermore, the Australia National Land and Water Resources Audit regularly uses the model to estimate runoff [71]. Chiew et al. [65] provided a detailed technical description of the rainfall-runoff process. The lower and upper limits of all the parameters used to calibrate the model for 12 different PET inputs were selected from previous work [65,67]. The detailed description of the SIMHYD model and the parameters used for calibration is shown in Figure A2 (Appendix B).

CAT
The CAT model is a physically based semi-distributed hydrological model developed to assess the water cycle and the impacts of impervious cover change on storm runoff [72][73][74]. The model is capable of simulating the hydrological components, including AET, soil moisture content, base flow, and runoff from a daily to a sub-daily time step. The main physical parameters that are required for initial simulation are slope, soil types, land use, aquifer, and river information. The conceptualization and development of the basic model were based on the unconfined aquifer and single soil layer assumptions. The model first separates the incoming precipitation that falls on impervious, pervious, and paddy areas. Subsequently, the three areas transform the precipitation to surface flow, infiltration, and evapotranspiration based on the governing equations. The infiltration process can be simulated using Rainfall Excess, Horton, and Green-Ampt models. The groundwater movement computation is based on the Darcy law. Furthermore, the Muskingum, Muskingum-Cunge, and Kinematic wave methods are also included for channel routing. The evapotranspiration in the CAT model depends on the land use and it is composed of plant transpiration as well as evaporation from the surface and soil layer. In this study, the infiltration process is modeled using the Green-Ampt method [75] and lumped hydrological models are considered to represent the rainfall-runoff process. The model has been applied to assess the impacts of urbanization on storm runoff in the north of Swindon town, UK [73]. Recently, Jang et al. [76] applied the CAT model to assess the impact of climate change impact on the hydrological cycle and water management practices while considering A1B and B1 scenarios. The model has also been applied to predict reservoir water levels, to analyze the water cycle, and for flood event simulations [77][78][79]. The main hydrological processes of the CAT model and the parameters that are used for calibration are presented in Figure A3 (Appendix B).

TANK
The TANK model is a conceptual rainfall-runoff model that was originally developed while considering the hydrological conditions of Japan. The TANK model has a series of four tanks arranged in order with 12 parameters [80]. The soil moisture storage tanks transform the catchment rainfall to streamflow by collecting water from each storage tank [81]. The rainfall enters the first tank, which drains free water at two outlets on the side (surface discharge) and the infiltrated water from soil moisture drains downward to the bottom outlet. The infiltrated water from the first tank enters the second tank; the second tank drains the free water to the side outlet (intermediate discharge) and the percolated water drains downward to the bottom outlet to enter the third tank. Similarly, the third tank drains the sub-base discharge to the side outlet and the deep percolated water drains the remaining water to the fourth tank. The lowermost tank drains the water at one side of the tank outlet (base discharge) and it often contributes a small portion of the total outflow. However, the water that drains from the three side tanks contributes a considerable amount of total outflow. Evapotranspiration is able to occur from the four tanks. Sugawara [82] discussed the rainfall-runoff and calibration process in detail. The model has been widely applied in many Asian countries and other basins for various applications [83][84][85][86]. The twelve feasible parameter ranges used to calibrate the TANK model were adopted from earlier works [82,87]. The TANK model description and the parameters that were used for calibration are presented in Figure A4 (Appendix B).

SAC-SMA
The SAC-SMA model is a soil moisture accounting model with 16 parameters [88] and the most complex model when compared to the other four models that are applied in this study. The SAC-SMA model was originally developed at the California-Nevada River Forecast Center and the U.S. National Weather Service River Forecast System uses it as the main model for forecasting flood events [89]. The model transforms the precipitation that falls on impervious and pervious areas to direct runoff. The remaining water infiltrated from the previous areas to the first soil layer will be changed to tension and free water. The free water drains water to the river through surface runoff or interflow and the remaining water will percolate to the second soil layer. The second soil layer contains three zones to store water: tension water, primary, and supplementary free water. The primary as well as supplementary free water transforms the percolated water to the total river flow via the base flow and subsurface discharge. The surface water and first and second zone tension water stores are able to contribute to total evapotranspiration. Burnash [88] presented a more detailed description of the SAC-SMA model. The SAC-SMA model has also been tested in a wide range of basins and climatic conditions [14,[90][91][92][93]. In this study, 13 SAC-SMA model parameters were selected for calibration [89,94,95] and the default values were used for the remaining three parameters across all of the catchments. The feasible parameter ranges for calibration were adopted from a previous study [96]. The detailed description of the SAC-SMA model and the feasible parameters ranges used for calibration are presented in Figure A5 (Appendix B).

Model Calibration and Validation
Hydrological model calibration can be done manually or automatically while using an optimization algorithm to enhance streamflow simulation and model robustness. There exist different kinds of global optimization algorithms, including the Monte Carlo, Genetic Algorithm, Generalized Likelihood Uncertainty Estimation, Artificial Neural Network, and SCE-UA, which have been used to calibrate hydrological models [97,98]. Similarly, the Latin Hypercube and Rosenbrock algorithms have also been used to calibrate hydrological models [99,100]. Each optimization algorithm has their own benefits and limitations. The SCE-UA global optimization algorithm considers the deterministic and probabilistic approaches. Moreover, the complex shuffling, the systematic evolution of complex points, and competitive evolution makes the algorithm efficient, effective, and robust [49].
In this study, the SCE-UA optimization algorithm of Duan et al. [49] was applied while considering its broad applications, flexibility, robustness, and efficiency [50][51][52]101]. The SCE-UA algorithm was applied to calibrate five hydrological models for 12 different PET inputs when considering four objective functions: NSE, LogNSE, KGE, and RSR. The NSE, LogNSE, KGE, and RSR objective functions have been recommended for evaluating hydrological model performance for streamflow simulation in recent studies [40,41]. The five hydrological models were calibrated while using seven years of observed streamflow data (2002-2008) for nine catchments and the remaining four year data were used for validation (2009)(2010)(2011)(2012). However, the Kyeongan catchment calibration was carried out using the available observed streamflow data (2000)(2001)(2002)(2003)(2004)(2005) and the remaining data (2006-2008) were used for validation. A one-year warm-up period was considered in all of the tested catchments to minimize the errors at the beginning of the calibration period for all models and their corresponding PET input values. In addition, the five hydrological model performance percentage variations were compared with other PET inputs considering the Penman-Monteith PET input NSE index values as a benchmark. The four objective functions, NSE, LogNSE, KGE, and RSR, were calculated, as follows: where Q obs , Q sim , and Q mean are the observed, simulated, and mean observed streamflow, respectively. R 2 , SD, and M are the Pearson Correlation Coefficient, standard deviation, and mean, respectively.

Model Complexity Comparison Criteria
The five hydrological models were further evaluated by comparing three statistical indicators: AIC, BIC, and Ra 2 for each PET input. The AIC, BIC, and Ra 2 are the statistical indicators commonly used to compare a model structure of different complexity in various subject areas. The three statistical indicators' model comparison criteria are able to penalize the number of estimated parameters depending on the model complexity. The model comparison criteria have been applied for frequency analysis of hydrological extremes [46,102] and for predictions of groundwater models [103]. Other studies have also applied the criteria to compare conceptual rainfall-runoff models with artificial neural network models [45,104,105]. Akaike [43] proposed the AIC for the first time based on Kullback-Leibler's information as the discrepancies between the observed and simulated models. Similarly, Schwarz [44] proposed a more parsimonious method, called the BIC, which considers the Bayesian approaches. The Adjusted R-square Ra 2 has also been used as an alternative method to compare model structures of different complexity [45]. The AIC, BIC, and Ra 2 statistical indicators were calculated for the five hydrological models of increasing complexity for each PET input. The model with a minimum AIC, BIC, and maximum Ra 2 was considered to be the better performing and with greater parsimony. In this study, similar model comparison criteria were adopted to evaluate the five hydrological models for each PET input. The three statistical indicators, AIC, BIC, and Ra 2 were calculated, as follows: where n, RMSE, p, and R 2 are the number of data points, root mean squared error of the model output, the number of model parameters, and Pearson Correlation Coefficient, respectively.

Model Robustness
Model robustness plays an important role in practical applications to provide reliable results. [106]. Although robustness is an essential quality of any model, hydrological models often tend to perform differently between the calibration and validation periods [7]. Moreover, the parameters of the conceptual rainfall-runoff model often face difficulties in appropriately reproducing the water balances [48]. The importance of minimizing the water balance errors in the model calibration process to enhance hydrological model robustness has been discussed in recent studies [47,48]. The KGE model evaluation criteria consider the various aspects of model behavior, including the water balance error, accuracy, and robustness. Furthermore, Coron et al. [48] investigated the robustness of three rainfall-runoff models regarding water balance simulation and suggested comparing the Dimensionless Bias (the ratio of 10 year mean simulated and observed flows) to evaluate the robustness of three hydrological models of increasing complexity over 20 French catchments. In this study, a similar approach that was suggested by Coron et al. [48] was adopted to evaluate the five hydrological models' robustness and behavioral similarities for each PET input. The Dimensionless Bias for a robust model is expected to be one. The Dimensionless Bias is calculated, as follows: where → Q sim and → Q obs are the 10 year mean simulated and observed flows, respectively.

Calibration and Validation
The model evaluation guidelines of Moriasi et al. [40] recommended a NSE > 0.50 and RSR < 0.70 as satisfactory performance for streamflow simulation. The performance of the GR4J, SIMHYD, CAT, TANK, and SAC-SMA models for the 12 PET inputs were found to be satisfactory for all of the tested catchments in the calibration periods, as shown in Figure 4. Similarly, the performances of the five hydrological models for each PET input fell within satisfactory criteria for all the tested catchments in the validation periods (Table A2 in   The five hydrological model performance percentage variations, calculated while considering the Penman-Monteith PET input NSE index values as benchmark, showed a maximum drop of 17% in the Soyanggang catchment when the SIMHYD model was applied using the Penman PET input. However, a maximum rise of 11% was observed in the Boryang catchment when the SAC-SMA model was applied using the Makkink PET input in the calibration period ( Figure A6 in Appendix B). Similarly, the five hydrological model performance percentage variations computed when considering identical assumption showed a maximum drop of 19% in the Soyanggang catchment when the SIMHYD model was applied while using the Penman PET input and a maximum rise of 6.5% was observed in the Chungju catchment when the SAC-SMA model was applied using the Granger Gray PET input in the validation period, as shown in Figure 6. The five hydrological model performance percentage variations calculated when considering the Penman-Monteith PET input NSE index values as a benchmark for four selected catchments in the validation period are presented in Figure 6. The five hydrological model performance percentage variations, calculated while considering the Penman-Monteith PET input NSE index values as benchmark, showed a maximum drop of 17% in the Soyanggang catchment when the SIMHYD model was applied using the Penman PET input. However, a maximum rise of 11% was observed in the Boryang catchment when the SAC-SMA model was applied using the Makkink PET input in the calibration period ( Figure A6 in Appendix B). Similarly, the five hydrological model performance percentage variations computed when considering identical assumption showed a maximum drop of 19% in the Soyanggang catchment when the SIMHYD model was applied while using the Penman PET input and a maximum rise of 6.5% was observed in the Chungju catchment when the SAC-SMA model was applied using the Granger Gray PET input in the validation period, as shown in Figure 6. The five hydrological model performance percentage variations calculated when considering the Penman-Monteith PET input NSE index values as a benchmark for four selected catchments in the validation period are presented in Figure 6. The GR4J, CAT, and SAC-SMA hydrological models' NSE, RSR, KGE, and R 2 assessment criterion showed relatively higher values when compared to the SIMHYD and TANK models in most of the tested catchments in the calibration as well as the validation periods. However, it would be naive to identify one best hydrological model because of the minor differences in the index values (NSE < 0.2, RSR < 0.15, KGE < 0.2, and R 2 < 0. 15) in most of the tested catchments, except for the Imha and Andong catchments. The five hydrological models' performance was observed to be higher and more insensitive for the Seolmacheon experimental catchment for the 12 PET inputs in the calibration and validation periods (Figures 4 and 5). The scatterplot of the five hydrological models' performance simulated over the application period (2002-2012) for four selected catchments when considering the Penman-Monteith PET input also showed a small variation (R 2 < 0.15), as presented in Figure 7. The GR4J, CAT, and SAC-SMA hydrological models' NSE, RSR, KGE, and R 2 assessment criterion showed relatively higher values when compared to the SIMHYD and TANK models in most of the tested catchments in the calibration as well as the validation periods. However, it would be naive to identify one best hydrological model because of the minor differences in the index values (NSE < 0.2, RSR < 0.15, KGE < 0.2, and R 2 < 0. 15) in most of the tested catchments, except for the Imha and Andong catchments. The five hydrological models' performance was observed to be higher and more insensitive for the Seolmacheon experimental catchment for the 12 PET inputs in the calibration and validation periods (Figures 4 and 5). The scatterplot of the five hydrological models' performance simulated over the application period (2002-2012) for four selected catchments when considering the Penman-Monteith PET input also showed a small variation (R 2 < 0.15), as presented in Figure 7.

Effect of Model Structure Complexity on Model Performance
The GR4J, CAT, and SAC-SMA models showed a consistent and better model performance in all of the tested catchments both in the calibration and validation periods. The GR4J, CAT, and SAC-SMA models' performance response to the 12 PET inputs showed a similar pattern and magnitude of NSE, LogNSE, RSR, and KGE, as shown in Figures 4 and 5. The SIMHYD model performed less when compared to the GR4J, CAT, TANK, and SAC-SMA models in most of the tested catchments. However, for the SIMHYD model, consistent and comparable performances were observed for the Kyeongan, Seomjingang, and Andong catchments in the calibration periods. Similar responses were also observed for the Seolmacheon, Seomjingang, and Chungju catchments in the validation periods. The TANK model performed relatively better than the SIMHYD model, except for the Kyeongan, Seomjingang, and Andong catchments in the calibration periods, and the Seolmacheon, Seomjingang, and Chungju catchments in the validation periods. The GR4J, SIMHYD, CAT, TANK, and SAC-SMA models responses to 12 different PET inputs showed a minor difference (NSE < 0.2) in terms of the NSE index value in all of the tested catchments. Although the SIMHYD model response to 12 PET inputs showed a minor difference (NSE < 0.2), unstable patterns of NSE were observed, except in the Seolmacheon experimental catchment. The five hydrological models' performance for four selected catchments simulated over the application periods (2002-2012) for the 12 PET inputs are shown in Figure 8.

Effect of Model Structure Complexity on Model Performance
The GR4J, CAT, and SAC-SMA models showed a consistent and better model performance in all of the tested catchments both in the calibration and validation periods. The GR4J, CAT, and SAC-SMA models' performance response to the 12 PET inputs showed a similar pattern and magnitude of NSE, LogNSE, RSR, and KGE, as shown in Figures 4 and 5. The SIMHYD model performed less when compared to the GR4J, CAT, TANK, and SAC-SMA models in most of the tested catchments. However, for the SIMHYD model, consistent and comparable performances were observed for the Kyeongan, Seomjingang, and Andong catchments in the calibration periods. Similar responses were also observed for the Seolmacheon, Seomjingang, and Chungju catchments in the validation periods. The TANK model performed relatively better than the SIMHYD model, except for the Kyeongan, Seomjingang, and Andong catchments in the calibration periods, and the Seolmacheon, Seomjingang, and Chungju catchments in the validation periods. The GR4J, SIMHYD, CAT, TANK, and SAC-SMA models responses to 12 different PET inputs showed a minor difference (NSE < 0.2) in terms of the NSE index value in all of the tested catchments. Although the SIMHYD model response to 12 PET inputs showed a minor difference (NSE < 0.2), unstable patterns of NSE were observed, except in the Seolmacheon experimental catchment. The five hydrological models' performance for four selected catchments simulated over the application periods (2002-2012) for the 12 PET inputs are shown in Figure 8.

Effect of PET Complexity on Model Parameters
The GR4J model, the production store (X1), water exchange coefficient (X2), and routing store (X3) parameters were used to adjust the overestimated and underestimated PET values for a better streamflow simulation. The unit hydrograph time base parameter (X4) was only responsible for fixing the lag time and it was insensitive to the 12 different PET input values. However, the three model parameters, X1, X2, and X3 were found to be sensitive to 12 different PET input values, as shown in Figure 9a. The SIMHYD model, the interception store capacity (INSE), maximum infiltration loss (COEFF), and soil moisture store capacity (SMSC) parameters were observed to be sensitive to 12 different PET input values, but the remaining parameters were relatively stable as shown in Figure  9b. The CAT model, wetting front soil suction head (PSI) parameter was sensitive to the 12 different PET inputs values, as shown in Figure 9c, and the remaining parameters were fairly stable in all the tested catchments. The response of the TANK model parameters to the 12 different PET inputs was clearly captured by the vertically arranged water level of the series of tanks, including the height of the runoff outlets of the first (HA1, HA2), second (HB), and third (HC) tanks, as shown in Figure 9d. Similarly, the SAC-SMA model, the upper layer tension water capacity (UZTWM), upper layer free water capacity (UZFWM), lower layer tension water capacity (LZTWM), lower layer supplemental free water capacity (LZFSM), and lower layer primary free water capacity (LZFPM), and the ratio of maximum and minimum percolation rates (ZPERC) parameters were significantly affected by the 12 PET input values, as shown in Figure 9e. The remaining SAC-SMA model parameters were quite stable in all of the tested catchments. The GR4J, SIMHYD, CAT, TANK, and SAC-SMA models' parameters showed a similar response to the 12 PET input in all the tested catchments and the

Effect of PET Complexity on Model Parameters
The GR4J model, the production store (X1), water exchange coefficient (X2), and routing store (X3) parameters were used to adjust the overestimated and underestimated PET values for a better streamflow simulation. The unit hydrograph time base parameter (X4) was only responsible for fixing the lag time and it was insensitive to the 12 different PET input values. However, the three model parameters, X1, X2, and X3 were found to be sensitive to 12 different PET input values, as shown in Figure 9a. The SIMHYD model, the interception store capacity (INSE), maximum infiltration loss (COEFF), and soil moisture store capacity (SMSC) parameters were observed to be sensitive to 12 different PET input values, but the remaining parameters were relatively stable as shown in Figure 9b. The CAT model, wetting front soil suction head (PSI) parameter was sensitive to the 12 different PET inputs values, as shown in Figure 9c, and the remaining parameters were fairly stable in all the tested catchments. The response of the TANK model parameters to the 12 different PET inputs was clearly captured by the vertically arranged water level of the series of tanks, including the height of the runoff outlets of the first (HA1, HA2), second (HB), and third (HC) tanks, as shown in Figure 9d. Similarly, the SAC-SMA model, the upper layer tension water capacity (UZTWM), upper layer free water capacity (UZFWM), lower layer tension water capacity (LZTWM), lower layer supplemental free water capacity (LZFSM), and lower layer primary free water capacity (LZFPM), and the ratio of maximum and minimum percolation rates (ZPERC) parameters were significantly affected by the 12 PET input values, as shown in Figure 9e. The remaining SAC-SMA model parameters were quite stable in all of the tested catchments. The GR4J, SIMHYD, CAT, TANK, and SAC-SMA models' parameters showed a similar response to the 12 PET input in all the tested catchments and the calibrated parameters were also found to be within the recommended ranges. The five hydrological models' parameters response to the 12 PET input for the Boryeong catchment is shown in Figure 9.
Sustainability 2018, 10, x FOR PEER REVIEW 17 of 33 calibrated parameters were also found to be within the recommended ranges. The five hydrological models' parameters response to the 12 PET input for the Boryeong catchment is shown in Figure 9.

Effect of PET and Model Structure Complexity on Model Robustness
The three statistical indicators, AIC, BIC, and Ra 2 values, calculated to compare a model structure also showed similar results with the model performance evaluation index values (NSE, LogNSE, RSR, and KGE). The GR4J, CAT, and SAC-SMA models' statistical indicator showed a minimum AIC, BIC, and maximum Ra 2 value in all of the tested catchments, both in the calibration and validation periods. The minimum AIC, BIC, and maximum Ra 2 value showed that the GR4J, CAT, and SAC-SMA had better model performance for simulating streamflow ( Figure 10). However, the SIMHYD and TANK models showed a higher AIC, BIC, and lower Ra 2 for most of the tested catchments when compared with the GR4J, CAT, and SAC-SMA models' statistical indicator values. Moreover, the effect of the 12 PET complexity was more captured by the three statistical indicators, and higher AIC, BIC, and lower Ra 2 values were also observed for the Hamon PET input values, as shown in Figure 10. Similar responses were also observed in all of the tested catchments and Figure 10 shows the AIC, BIC, and Ra 2 values of four selected catchments for the 12 PET input values in the validation periods.

Effect of PET and Model Structure Complexity on Model Robustness
The three statistical indicators, AIC, BIC, and Ra 2 values, calculated to compare a model structure also showed similar results with the model performance evaluation index values (NSE, LogNSE, RSR, and KGE). The GR4J, CAT, and SAC-SMA models' statistical indicator showed a minimum AIC, BIC, and maximum Ra 2 value in all of the tested catchments, both in the calibration and validation periods. The minimum AIC, BIC, and maximum Ra 2 value showed that the GR4J, CAT, and SAC-SMA had better model performance for simulating streamflow ( Figure 10). However, the SIMHYD and TANK models showed a higher AIC, BIC, and lower Ra 2 for most of the tested catchments when compared with the GR4J, CAT, and SAC-SMA models' statistical indicator values. Moreover, the effect of the 12 PET complexity was more captured by the three statistical indicators, and higher AIC, BIC, and lower Ra 2 values were also observed for the Hamon PET input values, as shown in Figure 10. Similar responses were also observed in all of the tested catchments and Figure 10 shows the AIC, BIC, and Ra 2 values of four selected catchments for the 12 PET input values in the validation periods. The PET and model structure complexity effect on model robustness were significantly affected by the low flow and the simulated streamflow was found to be greater than the observed, except the SIMHYD and TANK models. The Dimensionless Bias result also showed a higher water balance error, particularly for extreme hydrological conditions (high and low flow). Similarly, the five hydrological models overestimated the streamflow for the Hamon PET input value and a higher water balance error was observed as shown in Figure 11. The GR4J, CAT, and SAC-SMA models' behavioral similarities with a relatively lower water balance error were observed ( Figure 11) and similar responses were also seen in all of the tested catchments. The SIMHYD and TANK models showed a relatively higher water balance error and a lack of robustness was noticed for normal hydrological conditions, as shown in Figure 10. For a perfectly robust model, flat curves (Dimensionless Bias ≈ 1) and a minor water balance error were expected, but the results that were obtained showed a lack of robustness particularly for average hydrological conditions besides Hamon lack of robustness is observed for low flow conditions. The five hydrological models' Dimensionless Bias and behavioral similarities to the 12 PET inputs in the Boryeong catchment are shown in Figure 11 and similar responses were also observed in all of the tested catchments. The PET and model structure complexity effect on model robustness were significantly affected by the low flow and the simulated streamflow was found to be greater than the observed, except the SIMHYD and TANK models. The Dimensionless Bias result also showed a higher water balance error, particularly for extreme hydrological conditions (high and low flow). Similarly, the five hydrological models overestimated the streamflow for the Hamon PET input value and a higher water balance error was observed as shown in Figure 11. The GR4J, CAT, and SAC-SMA models' behavioral similarities with a relatively lower water balance error were observed ( Figure 11) and similar responses were also seen in all of the tested catchments. The SIMHYD and TANK models showed a relatively higher water balance error and a lack of robustness was noticed for normal hydrological conditions, as shown in Figure 10. For a perfectly robust model, flat curves (Dimensionless Bias ≈ 1) and a minor water balance error were expected, but the results that were obtained showed a lack of robustness particularly for average hydrological conditions besides Hamon lack of robustness is observed for low flow conditions. The five hydrological models' Dimensionless Bias and behavioral similarities to the 12 PET inputs in the Boryeong catchment are shown in Figure 11 and similar responses were also observed in all of the tested catchments.

Discussion
Most studies that were conducted in the subject area showed interesting results. The advantages of applying simple models rather than a complex one were first suggested by testing a topographybased model in a small forested catchment in the United States [107]. A similar finding was also obtained when three event-based rainfall-runoff models were applied [108]. Furthermore, Beven [109] suggested the use of three to five parameters to characterize the most dominant physical processes. Recent studies have also shown that complexity does not necessarily enhance the robustness of hydrological models; rather, data quality and hydrological characteristics have more influence on the robustness [7,110,111]. The benefit of reducing the water balance error to enhance conceptual rainfall-runoff models robustness has also been addressed briefly in recent studies [47,48,106].
The GR4J, SIMHYD, CAT, TANK, and SAC-SMA models have increasing levels of complexity. Of these models, the GR4J have the least parameters (four parameters), SIMHYD (seven parameters), CAT (eight parameters), and TANK (12 parameters) are moderate, and the SAC-SMA have several parameters (16 parameters). Similarly, for the 12 PET estimation methods, the data requirements also varied from simple (T) to complex (T, RH, RS, n, and u). The five hydrological models' performance was relatively low in the Imha and Andong catchments when compared to the other tested catchments in the validation periods. The Hamon PET estimation method was the only input that clearly showed a decreasing model performance in all of the tested catchments, except for Seolmacheon. This decrease in model performance was likely due to the underestimated values of the Hamon PET estimation method in all of the tested catchments. The better model performance and stable parameters observed in the smallest Seolmacheon experimental catchment were probably related to the size or the availability of high-quality hydro-meteorological data when compared to other study sites. The complexity of the PET estimation method and hydrological model structure did not affect the model performance significantly because of the efficiency of the SCE-UA algorithm to optimize the model parameters for each PET input in the calibration process.
The complexity of the PET inputs had a significant influence on the five hydrological models' parameters. Particularly, model parameters that are responsible for transforming the PET to AET were found to be sensitive for all of the tested catchments. The GR4J model, X1 parameter is in charge of changing the total PET input value by optimizing the level of production reservoir based on the

Discussion
Most studies that were conducted in the subject area showed interesting results. The advantages of applying simple models rather than a complex one were first suggested by testing a topography-based model in a small forested catchment in the United States [107]. A similar finding was also obtained when three event-based rainfall-runoff models were applied [108]. Furthermore, Beven [109] suggested the use of three to five parameters to characterize the most dominant physical processes. Recent studies have also shown that complexity does not necessarily enhance the robustness of hydrological models; rather, data quality and hydrological characteristics have more influence on the robustness [7,110,111]. The benefit of reducing the water balance error to enhance conceptual rainfall-runoff models robustness has also been addressed briefly in recent studies [47,48,106].
The GR4J, SIMHYD, CAT, TANK, and SAC-SMA models have increasing levels of complexity. Of these models, the GR4J have the least parameters (four parameters), SIMHYD (seven parameters), CAT (eight parameters), and TANK (12 parameters) are moderate, and the SAC-SMA have several parameters (16 parameters). Similarly, for the 12 PET estimation methods, the data requirements also varied from simple (T) to complex (T, RH, R S , n, and u). The five hydrological models' performance was relatively low in the Imha and Andong catchments when compared to the other tested catchments in the validation periods. The Hamon PET estimation method was the only input that clearly showed a decreasing model performance in all of the tested catchments, except for Seolmacheon. This decrease in model performance was likely due to the underestimated values of the Hamon PET estimation method in all of the tested catchments. The better model performance and stable parameters observed in the smallest Seolmacheon experimental catchment were probably related to the size or the availability of high-quality hydro-meteorological data when compared to other study sites. The complexity of the PET estimation method and hydrological model structure did not affect the model performance significantly because of the efficiency of the SCE-UA algorithm to optimize the model parameters for each PET input in the calibration process.
The complexity of the PET inputs had a significant influence on the five hydrological models' parameters. Particularly, model parameters that are responsible for transforming the PET to AET were found to be sensitive for all of the tested catchments. The GR4J model, X1 parameter is in charge of changing the total PET input value by optimizing the level of production reservoir based on the net rainfall intensity and net evapotranspiration. The X3 parameter variation was mainly caused by the complexity of the PET estimation methods that led to the higher or lower values that affect the routing reservoir. However, the X2 parameter's identical response to different PET inputs could be an indication of minor groundwater exchange in the tested catchments. The SIMHYD model's COEFF and SMSC parameters were more sensitive than the INSE parameter. The small variations of the INSE parameters showed the dominance of evapotranspiration from the soil moisture store rather than from the interception store. However, the SMSC parameter variation was due to the SMSC linear relationship with evapotranspiration. The COEFF parameter was affected by the nonlinear relationship with the SMSC parameter in the infiltration estimation process. The sensitivity of these three parameters might help us understand the effect of soil moisture and land use on evaporation. The CAT model PSI parameter sensitivity to the PET input was due to the dependence of the model evapotranspiration on the land use of the 10 tested catchments, which is forest dominated. However, the river and aquifer parameters were not sensitive in all of the tested catchments. In the TANK model, the sensitivity of four of the parameters (HA1, HA2, HB, and HC) to PET inputs was because the model assumption for transforming the water level of each tank to AET was initially based on the availability of water on the sequentially arranged tanks. The other parameters remained insensitive as the output from each tank was transformed to surface, intermediate, and subbase runoff. In the SAC-SMA model, tension water and free water storage parameter sensitivity to PET input was because the AET estimations from the upper and lower water storage were proportional to the magnitude of PET, whereas the sensitivity of the ZPERC parameter was unexpected, as it is responsible for controlling percolation. The unexpected responses of the ZPERC parameters could be related to the problem of finding a unique set of values and over-parameterization, which occurs in most rainfall-runoff models. Similar findings were observed in a previous research work conducted while using the GR4J and TOPMODEL models when considering different PET input scenarios [17].
The results of the three statistical indicators (AIC, BIC, and Ra 2 ) also agreed with the computed model evaluation index values (NSE, LogNSE, RSR, and KGE). The results of the three statistical indicator showed higher AIC, BIC, and lower Ra 2 values for the Hamon PET estimation method in all of the tested catchments. The lower performance for the Hamon PET input was most likely due to the underestimated value of the PET. The 12 PET estimation methods and five hydrological models' structure complexity effect were better captured by the three statistical indicators. Although the model complexity comparison criteria (AIC, BIC, and Ra 2 ) results revealed that the GR4J, CAT, and SAC-SMA models had a better capability to simulate streamflow, it would be naive to identify the best hydrological model. The GR4J, CAT, and SAC-SMA models' behavioral similarities and lower Dimensionless Bias were observed in all of the tested catchments. The five hydrological models lacked robustness, particularly for extreme hydrological conditions (high and low flow) and for the Hamon PET input value, which was observed in all of the tested catchments, except the Seolmacheon. Perrin et al. [7] also discussed the difficulty of identifying the best hydrological model. Furthermore, a previous study conducted on the frequency analysis of hydrological extremes also addressed the limitations and difficulties of completely concluding the superiority of one model for practical applications [46].

Conclusions
In this study, the responses of five hydrological models to the 12 PET estimation methods and their effect on model performance, optimized parameters, and robustness were assessed. The GR4J, SIMHYD, CAT, TANK, and SAC-SMA models with increasing levels of complexity were applied over 10 catchments that are located in South Korea. The 10 selected catchments represented the main hydrological characteristics of the Korean Peninsula and have continuous hydro-meteorological data with negligible artificial effects on streamflow. The 10 catchments are mainly covered with forest and sandy loam soil. The 12 PET estimation methods that were used in this study varied from simple to Sustainability 2018, 10, 2837 21 of 34 more data demanding. The Hamon PET estimation method was the only input that underestimated the estimated daily PET values in all of the tested catchments except for the Seolmacheon. The SCE-UA algorithm was used to calibrate five hydrological models for the 12 PET inputs when considering four objective functions: NSE, KGE, and RSR. The GR4J, CAT, and SAC-SMA models showed a consistent model performance regardless of the differences in model structure complexity in both the calibration and validation periods. The five hydrological models' performance were not sensitive to the 12 PET inputs due to the SCE-UA algorithm's robustness and efficiency. However, the five hydrological models' parameters in charge of transforming the PET to AET were observed to be sensitive for the PET input values.
The Hamon PET estimation method was the only method that showed decreasing model performance in all of the tested catchments, except for the Seolmacheon. The results of the three statistical indicators (AIC, BIC, and Ra 2 ) captured the PET and hydrological model structure complexity. The model complexity comparison criteria results showed minimum AIC, BIC, and maximum Ra 2 values for the GR4J, CAT, and SAC-SMA, which is in agreement with the computed model evaluation index values (NSE, RSR and KGE). In addition, the three hydrological models showed lower Dimensionless Bias, and identical behavioral similarities were observed in all of the tested catchments. However, the five hydrological models' lack of robustness were observed for high and low flow, as well as the Hamon PET input values in all of the tested catchments.
The results of this study revealed the dependence of the model performance and robustness on hydrological conditions (high and low flow) and catchment characteristics, rather than on complexity. The findings of this study will provide a certain level of confidence for hydrologists who work with simple hydrological models and PET estimation methods, particularly in areas with limited hydro-meteorological data, provided that standard datasets are employed.   50 is the aerodynamic resistance for crop height, h(s/m); r s is the surface resistance (s/m); (r s ) c is the surface resistance of a well-watered crop equivalent to FAO crop coefficient (s/m); ρ is the water density (kg/m 3 ); ρ a is the mean air density at constant pressure (kg/m 3 ), SVD is the saturated vapor density at mean air temperature (g/m 3 ); G is the soil heat flux (MJ/m 2 d); R n is the net radiation at evaporating surface at air temperature ((MJ/m 2 d); R s is the incoming solar radiation ((MJ/m 2 d); R a is the extraterrestrial radiation ((MJ/m 2 d); R NPan is the net radiation at Class-A pan ((MJ/m 2 d); λ is the latent heat of vaporization (MJ/kg); c a is the specific heat of air (MJ/kg • C); Ta, T max , T min are the mean, maximum, and minimum temperature ( • C), respectively; b var is the working variable (-); G g is the dimensionless relative evaporation parameter (-); C HS is the Hargreaves-Samani working coefficient (-); a p is the constant in PenPan equation (-); A p is the gradient (-); B p is the intercept (-); RH, RH mean , RH min are the daily, mean, minimum relative humidity(%), respectively; p y is the percentage of actual daytime hours for the specific day compared to the day-light hours for the entire year (%); N is the total day length (h); n is the duration of sunshine hours in a day (h); ∆ is the slope of the saturation vapor pressure curve (kPa/ • C); γ is the psychrometric constant (kPa/ • C); VPD 2, VPD 50 are the vapor pressure deficit at 2 m and 50 m, respectively (kPa); (v a * − v a ) is the vapor pressure deficit at air temperature (kPa); f Pan (u) is the wind function for Class-A pan (ms −1 ); u 2 is the average daily wind speed at 2 m height (m/s); and u is the mean daily wind speed (m/s).  Appendix B Figure A1. The GR4J model structure and feasible parameter ranges [15]. Figure A2. The SIMHYD model structure and feasible parameter ranges [65]. Figure A1. The GR4J model structure and feasible parameter ranges [15]. Appendix B Figure A1. The GR4J model structure and feasible parameter ranges [15]. Figure A2. The SIMHYD model structure and feasible parameter ranges [65]. Figure A2. The SIMHYD model structure and feasible parameter ranges [65].    Figure A3. The CAT model structure and feasible parameter ranges [74]. Figure A4. The TANK model structure and feasible parameter ranges [80]. Figure A4. The TANK model structure and feasible parameter ranges [80].
Sustainability 2018, 10, x FOR PEER REVIEW 27 of 33 Figure A5. The SAC-SAM model structure and feasible parameter ranges [88]. Figure A5. The SAC-SAM model structure and feasible parameter ranges [88].