Rainfall-Runo ﬀ Modeling of the Nested Non-Homogeneous Sava River Sub-Catchments in Slovenia

: Rainfall-runo ﬀ modeling is nowadays applied for water resources management, water system design, real-time forecasting, ﬂood design and can be carried out by using di ﬀ erent types of hydrological models. In this study, we focused on lumped conceptual hydrological models and their performance in diverse sub-catchments of the Sava River in Slovenia, related to their size and non-homogeneity. We evaluated the di ﬀ erence between modeled and measured discharges of selected discharge gauging stations, using di ﬀ erent model performance criteria that are usually applied in hydrology, connecting the results to geospatial analysis of geological and hydrogeological characteristics, land use, runo ﬀ potential, proportion of agglomeration and various meteorological variables. Better model performance was obtained for catchments with a higher runo ﬀ potential and with less variations in meteorological variables. Regarding the number of used parameters, the results indicated that the tested Genie Rural 6-parameter Journalier (GR6J) model with 6 parameters performed better than the Genie Rural 4-parameter Journalier (GR4J) model with 4 parameters, especially in the case of larger sub-catchments. These results illustrate the comprehensive nature of lumped models. Thus, they yield good performance in case of the catchments with indistinguishable characteristics.


Introduction
Reliable and robust hydrological models of different types are required for water resource engineering applications. Furthermore, hydrological modeling is needed to evaluate the impact of extreme events such as droughts and floods, on natural resources as well as risk management and planning. By transforming meteorological input variables, using mathematical relationships and parameters within the specific model one can simulate hydrological output variables to mimic the most important processes of the hydrological cycle. Throughout the years, several hydrological models have been developed for specific needs. According to Singh et al. [1], at least 64 different hydrological models are available for rainfall-runoff simulations. However, not all models are suitable for all environments (e.g., [2,3]). Thus, model selection should be tailored according to the modeling purpose and aim and selected model should be able to capture the main catchment characteristics reflecting the hydrological behavior of the catchment (e.g., [4]). For example, for the purpose of the design hydrograph associated with a specific return period one can use the event-based models (i.e., modeling of the specific rainfall event), whereas for the evaluation of different climate scenarios one should use the continuous rainfall-runoff models (i.e., modeling of the longer rainfall-runoff time-series).
The later models are also the main topic of this paper. Additionally, complex catchments in terms of non-homogeneity of geology or land-use can sometimes require modification of existing models in case of non-satisfactory model performance. On the other hand, hydrological models must be robust and not over-parameterized [4,5]. Previous studies have shown that one of the main reasons of model output and parameter uncertainty is an over-parameterization [2]. Jajarmizadeh et al. [3] demonstrated that in some cases the performance of simple models can be almost as high as of models with more parameters. Similarly, some other researchers indicated that increased model complexity does not necessarily yield significantly better model performance. However, Perrin et al. [6] argued that there is no single outstanding model according to any assessment criterion that would satisfy all diverse needs and conditions. Jajarmizadeh et al. [3] conducted an extensive comparative performance assessment of the structures of 19 daily lumped models in 429 catchments. They demonstrated that that the complex models outperform the simple models in calibration mode but not in verification mode. Two years later [6] focused on creating accurate, fixed, conceptual model structure, using 429 catchments with distinctive characteristics, but still robust and simple, so that the model would have the right balance between complexity and robustness. They improved a 3-parameter Genie Rural (GR3J) lumped conceptual model proposed by Edijatno et al. [7] and developed 4-parameter GR4J model. In addition, Oudin et al. (2005) [8] evaluated 27 existing equations for the potential evapotranspiration calculation based on the results of the simulations carried out using different hydrological models and came up with a simplified and effective equation that gives the best modeling results in a process of rainfall-runoff modeling. Therefore, only time series of rainfall and air temperature, with a daily time step are needed as input data in the case of the GR4J model and there are four free parameters that should be calibrated during the modeling process [6]. In 2011 the newer model version was developed by Pushpalatha et al. [9]. Two parameters (i.e., GR6J) were added in order to obtain better simulation of water exchange between the river and groundwater, which is one of the most important processes for low flow simulations. During hydrological drought, groundwater can be the main source of water supply to the watercourse. In the study conducted by Pushpalatha et al. [9], the GR6J model was tested on 1000 basins in France using data from 1970 to 2006, which captured the great variability of meteorological and hydrological data. The research included also periods of severe droughts (1976, 1989-1991, 2003, and 2005). Moreover, some catchments with specific characteristics were included such as karst areas. The results of the study indicated that the complexity added by an additional free parameter is warranted by the model's results; however, the level of performance in low-flow conditions seems to remain quite low. In another study [10] they tested different versions of the GR lumped conceptual models, a 4-parameter Journalier (GR4J), Genie Rural, a 6-parameter Journalier (GR6J), and the CemaNeige GR6J with additional snow module for rainfall-runoff modeling in case of non-homogeneous Ljubljanica River karst catchment. The results showed that GR6J model slightly improved the performance of the GR4J model, but only in the validation mode. Moreover, the GR6J model did not perform better than the GR4J model in terms of the low flow simulation. On the other hand, CemaNeige GR6J model with additional snow module outperforms both other tested versions of the model. Another study conducted by Sezen et al. [11] revealed that lumped GR4J model outperformed several data mining models when modeling rainfall-runoff relationship in case of the complex karst catchments in Slovenia. In the comparative study of the 237 French catchments, Van Esse et al. [12] investigated the influence of the conceptual model structure on model performance. They investigated whether to use fixed or flexible model structure for specific catchment. They pointed out, that the larger and more saturated the catchment is, the better performance of the lumped conceptual model will be obtained. As pointed out by Andréassian et al. [13], hydrological models will always incompletely reflect the nature. In order to improve hydrological models straightforward tests on large and diverse data are needed. That is also important from the perspective of this study, where we evaluated the performance of lumped conceptual models with different complexity for the rainfall-runoff modeling in case of the non-homogeneous catchments.
The main aim of this paper was to investigate the performance of the lumped conceptual GR4J and GR6J models in case of the Sava River catchment up to theČatež gauging station and its four nested sub-catchments (i.e., up to the Blejski most, Okroglo, Šentjakob, and Hrastnik gauging stations) with various characteristics relating to their size and non-homogeneity. There are also some other important differences among the nested sub-catchments. For example, the smallest sub-catchment has significant torrential characteristics while the largest catchment has much slower response to the rainfall input. Thus, we were interested if these differences among catchments have an impact on the modeling performance. Additionally, we were also interested if a more complex GR6J model can yield significant improvement over the GR4J modeling results.

Data
The Sava River catchment covers 53% of the Slovenia and the Sava River is part of the Danube River catchment. The Sava River has two springs, namely the Zelenci spring (The Sava Dolinka River) and the waterfall Savica (The Sava Bohinjka River). Near the city of Radovljica Sava Dolinka and Sava Bohinjka have the confluence and river downstream from this point is named Sava. In order to investigate the performance of the GR4J and GR6J models five catchments that are part of the Sava River catchment were selected. Figure 1 shows the location of these five catchments and corresponding measuring profiles/discharge gauging stations. Basic characteristics of the five selected catchments are shown in Table 1. The smallest sub-catchment (i.e., Blejski most) is part of the Sava Dolinka River catchment. Steep slopes and torrential characteristics characterize the headwater part of the Sava River catchment ( Table 1). The downstream part of the catchment is not so steep and the Sava River flow characteristics are not purely torrential any more (e.g., [14]). The idea behind the catchments selection was to find nested catchments with the same data availability and different characteristics in order to investigate the impact of the size and non-homogeneity of the catchments on the rainfall-runoff modeling performance.
Data period from 2000 until 2015 was used in the study. For the calibration period data from 1 January 2000 until 7 July 2013 was applied (year 2000 was used as warm-up period), whereas for the validation period data from 8 July 2013 until 31 December 2015 was used. We followed the suggestions of other researchers when splitting the data [5] and at the same time used larger percent of data for the calibration process since some studies have indicated that calibration of the model on the full time series could be preferred in some cases [15]. Blejski most and Okroglo gauging stations had some missing data; therefore, linear correlation was applied to replace this missing data. In the case of Blejski most gauging station, correlation with discharge data from Okroglo gauging station was conducted (1 year of data was calculated based on the Pearson correlation coefficient of 0.93) and in the case of Okroglo gauging station, correlation with discharge data from Šentjakob gauging station was conducted (a bit more than 1 year of data was calculated based on Pearson correlation coefficient of 0.95). Precipitation and air temperature stations were also selected based on the data availability. Some smaller percentage of missing data was interpolated using linear regression, where the most relevant nearby meteorological station was used in order to estimate the missing data. Five rainfall stations had up to 28% of missing data (stations BohinjskaČešnjica, Tržič, Kum, Babno polje, Spodnji Dolič) and the Pearson correlation coefficients were between 0.87 and 0.95. Moreover, four air temperature stations had some missing data and Pearson correlation coefficient in this case ranged from 0.98 to 0.99.
Additionally, in order to investigate how does different catchment characteristics impact on the modeling performance we also analyzed geology, land-use, hydrogeology, percentage of agglomerations, runoff potential (related to the Soil Conservation Service (SCS) method [16]), average air temperature, average annual rainfall , average number of days with rainfall above 70 mm , and average number of days with snow cover . Figure 2 shows raster maps of runoff potential [17], annual air temperature, average number of days with more than 70 mm of rain, and average number of days with snow cover [18].      [16], average annual air temperature map, average annual rainfall map, and average number of days with snow cover map.

Hydrological Models
In the scope of this study, we have decided to use the GR4J and GR6J models. These two models were selected because they have yielded good performance in previous applications to the catchments in Slovenia [10,11]. On the other hand, Slovenian Environment Agency, which is responsible for the flood forecasting uses the DHI NAM model [19]. There are no official suggestions regarding the hydrological modeling at the state level in Slovenia, which means that applicability of different models should be investigated. As part of this study the rainfall-runoff modeling was conducted using R software [20] and AirGR package [21,22] enabling the GR4J and GR6J lumped conceptual models simulations. Detailed description of the two selected models and equations used in the scope of the models can be found in Perrin et al. [6,9]. The GR4J and GR6J are lumped conceptual rainfall-runoff models that only use rainfall and potential evapotranspiration as input data in order to simulate discharges. The equation proposed by Oudin et al. [8] was applied in order to calculate the potential evapotranspiration data. In this study, daily model time step was applied and average daily discharge data was used. The GR4J model is a four parameter model. Parameters are the maximum capacity of the production store, the groundwater exchange coefficient, the oneday ahead maximum capacity of the routing store, and the unit hydrograph time base [6]. The GR6J model version additionally uses intercatchment exchange threshold and coefficient for emptying exponential store parameters (e.g., [21,22]). In the study, the GR4J and GR6J model parameters were calibrated using the method proposed by Michel [23] that is also included in the AirGR package [21,22]. The selected calibration method combines global and local approaches for parameter estimation [21][22][23]. Nash-Sutcliffe efficiency (NSE), Kling-Gupta efficiency (KGE), and root mean square error (RMSE) criteria were applied in order to evaluate model performance [24,25]. The GR4J and GR6J models were already applied to some of the catchments located in Slovenia (see [10,11,26]).  [16], average annual air temperature map, average annual rainfall map, and average number of days with snow cover map.

Hydrological Models
In the scope of this study, we have decided to use the GR4J and GR6J models. These two models were selected because they have yielded good performance in previous applications to the catchments in Slovenia [10,11]. On the other hand, Slovenian Environment Agency, which is responsible for the flood forecasting uses the DHI NAM model [19]. There are no official suggestions regarding the hydrological modeling at the state level in Slovenia, which means that applicability of different models should be investigated. As part of this study the rainfall-runoff modeling was conducted using R software [20] and AirGR package [21,22] enabling the GR4J and GR6J lumped conceptual models simulations. Detailed description of the two selected models and equations used in the scope of the models can be found in Perrin et al. [6,9]. The GR4J and GR6J are lumped conceptual rainfall-runoff models that only use rainfall and potential evapotranspiration as input data in order to simulate discharges. The equation proposed by Oudin et al. [8] was applied in order to calculate the potential evapotranspiration data. In this study, daily model time step was applied and average daily discharge data was used. The GR4J model is a four parameter model. Parameters are the maximum capacity of the production store, the groundwater exchange coefficient, the one-day ahead maximum capacity of the routing store, and the unit hydrograph time base [6]. The GR6J model version additionally uses intercatchment exchange threshold and coefficient for emptying exponential store parameters (e.g., [21,22]). In the study, the GR4J and GR6J model parameters were calibrated using the method proposed by Michel [23] that is also included in the AirGR package [21,22]. The selected calibration method combines global and local approaches for parameter estimation [21][22][23]. Nash-Sutcliffe efficiency (NSE), Kling-Gupta efficiency (KGE), and root mean square error (RMSE) criteria were applied in order to evaluate model performance [24,25]. The GR4J and GR6J models were already applied to some of the catchments located in Slovenia (see [10,11,26]).

Modeliing Performance
Firstly, we investigated the performance of the GR4J and GR6J lumped conceptual models for five selected nested sub-catchments. Figures 3 and 4 illustrate graphical evaluation of the modeling performance for the validation period for the five nested sub-catchments using the GR4J and GR6J models, respectively. Additionally, Figure 5 shows comparison between simulated and observed discharge values for all five considered catchments. In addition to the graphical comparison of simulated and observed discharge data, the model performance was evaluated also using NSE, KGE and RMSE efficiency criteria. Table 2 shows model efficiency criteria for the GR4J and GR6J models in the validation period for all five considered catchments.
Based on the presented results one can notice that for larger catchments (i.e.,Čatež and Hrastnik gauging stations) better modeling performance was obtained in comparison to smaller catchments with more torrential characteristics and where the topography is more complex (Figures 3-5, Table 2). Moreover, a reason for worse performance in case of smaller catchments with higher altitude is also the fact that GR4J and GR6J models do not have a separate snow module. Thus, model performance could be improved using the snow module (i.e., CemaNeige). This conclusion can be made based on the results of all three selected performance criteria and also based on the graphical comparison of the results (Figures 3-5, Table 2). Moreover, the same conclusion applies to both applied models, namely GR4J and GR6J models. This finding is in accordance with the conclusion made by Van Esse et al. [12], who analyzed rainfall-runoff modeling performance using 13 different conceptual models in case of 237 French catchments. They argued that conceptual hydrological models perform better on larger and/or wetter catchments than on smaller and/or drier catchments. Furthermore, Merz et al. [27] investigated 269 catchments in Austria using the conceptual model and came to similar conclusions. Additionally, Sezen et al. [11] stated that better modeling results were obtained for the larger Ljubljanica River catchment up to the Moste gauging station compared to some smaller tested catchments with the catchment area around 50 km 2 . As pointed out by Van Esse et al. [12], a possible reason could be that hydrological processes at larger scales are mixed and have therefore smoother behavior, which enable easier modeling by the conceptual model structure. In our case, also significant torrential characteristics of smaller catchments are important.
When comparing the GR6J and GR4J models one can notice that according to the KGE criterion the GR6J model is able to yield better modeling performance in case of all five tested nested sub-catchments ( Table 2). This is especially evident in case of low-flow data (Figures 3 and 4). However, it is evident that differences between both tested model versions are not significantly large, which is especially evident in case of graphical comparison of the mean observed and modeled discharge data (Figures 3-5, Table 2). A similar conclusion can be made for the NSE criterion with the exception of the smallest catchment with the outlet at the Blejski most gauging station, where slightly better results were obtained in the case of the GR4J model (Table 2). Similar findings were obtained by [10], who compared the performance of the GR lumped conceptual models with different number of parameters in the case of the Ljubljanica River catchment. They demonstrated that the CemaNeige GR6J model, which includes an additional snow routine module, yielded the best modeling results. Thus, one could argue that especially for the small catchments (e.g., Blejski most gauging station) inclusion of the additional parameters (e.g., snow module) could lead to improved modeling performance. However, it should also be noted that inclusion of the snow module would lead to two additional model parameters that should be calibrated [28,29].     Additionally, comparison of the calibration and validation results demonstrated that NSE and KGE model efficiency criteria results in the validation period were generally better compared to the calibration period. This could indicate that tested GR models are robust and suitable for the rainfallrunoff modeling of the selected nested sub-catchments. However, in the case of the RMSE criterion, better results were obtained in the calibration period. Another interesting conclusion that can be made based on the results is that with the increasing catchment area the maximum capacity of the production store (i.e., model parameter) was decreasing. This means that larger catchments had smaller production store compared to smaller catchments. This can be regarded as relatively unexpected result since smaller values of this parameter leads to higher peak discharges. However, it is also true that this parameter impacts on the dynamics of the recession part of the hydrograph (i.e., higher values lead to smaller slope of the recession part). Furthermore, the impact of this parameter can also be reduced with other parameters (e.g., the one-day ahead maximum capacity of the routing store parameter). Moreover, the unit hydrograph time base parameter was increasing with increasing catchment area. This could be somehow expected, since smaller values of this parameter lead to higher peak discharge values and faster response to rainfall input. Thus, this is a characteristic of torrential catchments. Furthermore, the maximum value of the groundwater exchange parameter was obtained in case of the smallest catchment (i.e., Blejski most gauging station), whereas the largest catchment (i.e., Čatež gauging station) had the smallest value of this parameter. Table 2. Model efficiency criteria (i.e., Nash-Sutcliffe efficiency (NSE), Kling-Gupta efficiency (KGE), root mean square error (RMSE)) for the GR4J and GR6J models in the validation period for five considered catchments.

Non-Homogeneity Investigation
In the second step of the study, we investigated the relationship between different catchment characteristics (also in terms of non-homogeneity) and rainfall-runoff modeling performance. Catchment characteristics shown in Section 2.1 were analyzed. Analyses demonstrated that geology, land-use, hydrogeology, and percentage of agglomerations do not have a significant relationship with the modeling performance. This means that we were not able to relate better or worse Additionally, comparison of the calibration and validation results demonstrated that NSE and KGE model efficiency criteria results in the validation period were generally better compared to the calibration period. This could indicate that tested GR models are robust and suitable for the rainfall-runoff modeling of the selected nested sub-catchments. However, in the case of the RMSE criterion, better results were obtained in the calibration period. Another interesting conclusion that can be made based on the results is that with the increasing catchment area the maximum capacity of the production store (i.e., model parameter) was decreasing. This means that larger catchments had smaller production store compared to smaller catchments. This can be regarded as relatively unexpected result since smaller values of this parameter leads to higher peak discharges. However, it is also true that this parameter impacts on the dynamics of the recession part of the hydrograph (i.e., higher values lead to smaller slope of the recession part). Furthermore, the impact of this parameter can also be reduced with other parameters (e.g., the one-day ahead maximum capacity of the routing store parameter). Moreover, the unit hydrograph time base parameter was increasing with increasing catchment area. This could be somehow expected, since smaller values of this parameter lead to higher peak discharge values and faster response to rainfall input. Thus, this is a characteristic of torrential catchments. Furthermore, the maximum value of the groundwater exchange parameter was obtained in case of the smallest catchment (i.e., Blejski most gauging station), whereas the largest catchment (i.e., Catež gauging station) had the smallest value of this parameter.

Non-Homogeneity Investigation
In the second step of the study, we investigated the relationship between different catchment characteristics (also in terms of non-homogeneity) and rainfall-runoff modeling performance. Catchment characteristics shown in Section 2.1 were analyzed. Analyses demonstrated that geology, land-use, hydrogeology, and percentage of agglomerations do not have a significant relationship with the modeling performance. This means that we were not able to relate better or worse performance of the models with aforementioned catchments properties. On the other hand, runoff potential, which is related to the soil type [16] demonstrated some relationship with the model performance ( Figure 6). One can notice that larger catchments have higher values of runoff potential compared to smaller catchments ( Figure 6). This can partly explain smaller values of the production store parameter that was obtained in the process of the model calibration in case of larger catchments (Section 3.1). On the other hand, this could also mean that for the catchments with smaller rainfall losses (i.e., higher runoff potential) one can expect a better model performance. This could be explained with more "predictable nature" of this kind of catchments compared to the catchments, where the percentage of direct runoff is smaller and rainfall losses are larger (i.e., lower runoff potential).
Water 2019, 11, x FOR PEER REVIEW 10 of 13 performance of the models with aforementioned catchments properties. On the other hand, runoff potential, which is related to the soil type [16] demonstrated some relationship with the model performance ( Figure 6). One can notice that larger catchments have higher values of runoff potential compared to smaller catchments ( Figure 6). This can partly explain smaller values of the production store parameter that was obtained in the process of the model calibration in case of larger catchments (Section 3.1). On the other hand, this could also mean that for the catchments with smaller rainfall losses (i.e., higher runoff potential) one can expect a better model performance. This could be explained with more "predictable nature" of this kind of catchments compared to the catchments, where the percentage of direct runoff is smaller and rainfall losses are larger (i.e., lower runoff potential). Figure 6. The relationship between runoff potential, average air temperature, average annual rainfall, and average number of days with snow cover and the GR4J model performance (i.e., NSE criteria is used).
Additionally, also the influence of various meteorological variables of the catchments on the model performance was investigated ( Figure 6). When comparing the average air temperature values and the number of days with snow cover one can notice that for the catchments with higher average air temperature and smaller number of snow cover days better modeling performance was obtained. This result could be expected since tested GR4J and GR6J models do not specifically account for the snow accumulation and melting process (e.g., [6]). Thus, as already stated in the case of the smallest catchment (i.e., Blejski most gauging station) modeling performance could be improved using the CemaNeige snow module (e.g., [28,29]). Moreover, it should also be noted that higher air temperature values also lead to higher potential evapotranspiration values, which with lower average annual rainfall cause smaller specific discharge values ( Figure 5). Moreover, better modeling performance is also obtained in case of lower mean annual rainfall values ( Figure 6). This can be explained with the fact that the smallest catchment (i.e., Blejski most gauging station) has torrential characteristics and Figure 6. The relationship between runoff potential, average air temperature, average annual rainfall, and average number of days with snow cover and the GR4J model performance (i.e., NSE criteria is used).
Additionally, also the influence of various meteorological variables of the catchments on the model performance was investigated ( Figure 6). When comparing the average air temperature values and the number of days with snow cover one can notice that for the catchments with higher average air temperature and smaller number of snow cover days better modeling performance was obtained. This result could be expected since tested GR4J and GR6J models do not specifically account for the snow accumulation and melting process (e.g., [6]). Thus, as already stated in the case of the smallest catchment (i.e., Blejski most gauging station) modeling performance could be improved using the CemaNeige snow module (e.g., [28,29]). Moreover, it should also be noted that higher air temperature values also lead to higher potential evapotranspiration values, which with lower average annual rainfall cause smaller specific discharge values ( Figure 5). Moreover, better modeling performance is also obtained in case of lower mean annual rainfall values ( Figure 6). This can be explained with the fact that the smallest catchment (i.e., Blejski most gauging station) has torrential characteristics and is located in alpine area, which means that rainfall events are often very localized and intense. Thus, these kinds of events are more difficult to model in comparison with longer duration frontal rainfall events. This finding is in accordance with the conclusion made by Van Esse et al. [12]. They argued that model structure performed worse in the catchments with flashy flows or unexplained variations in low flows.

Conclusions
This paper presents rainfall-runoff modeling results using the lumped conceptual models with different number of parameters, namely the GR4J and GR6J models applied to five nested non-homogeneous sub-catchments that are part of the Sava River catchment in Slovenia. Based on the presented results, the following conclusions can be made: - Better modeling performance is obtained in case of larger catchments. This applies for the GR4J and GR6J models. - The GR6J model with 6 parameters, which was developed to improve low-flow simulations, slightly improved the modeling performance in comparison to the GR4J model with 4 parameters. -Investigation of the influence of the non-homogeneity on the modeling performance demonstrates that for larger more non-homogeneous catchments in the Sava River catchment in Slovenia better modeling performance was obtained comparing to smaller catchments. The main reason for this is probably due to the fact that smaller catchments have significant torrential characteristics and are located in Alpine climate, where snow accumulation and melting process has an important role in runoff generation. However, better performance could be obtained using hydrological model with separate snow module. -Better modeling results are obtained in case of the catchments with higher runoff potential. Other considered catchment characteristics like geology, land-use, hydrogeology, and percentage of agglomerations do not demonstrate a significant relationship with the modeling performance. -Related to various meteorological variables, higher air temperature values, smaller number of snow cover days and lower annual rainfall amount yielded better modeling performance.
Future studies could investigate the performance of the hydrological model with the incorporated snow module (CemaNeige GR6J) on various catchments in Slovenia. Moreover, performance of hourly forecasting model that is also part of the AirGR package could be evaluated and compared to the performance of DHI NAM model that is used by the Slovenian Environment Agency.