Enhancing Physical Similarity Approach to Predict Runo ﬀ in Ungauged Watersheds in Sub-Tropical Regions

: Regionalization techniques have been comprehensively discussed as the solution for runo ﬀ predictions in ungauged basins (PUB). Several types of regionalization approach have been proposed during the years. Among these, the physical similarity one was demonstrated to be one of the most robust. However, this method cannot be applied in large regions characterized by highly variable climatic conditions, such as sub-tropical areas. Therefore, this study aims to develop a new regionalization approach based on an enhanced concept of physical similarity to improve the runo ﬀ prediction of ungauged basins at country scale, under highly variable-weather conditions. A clustering method assured that watersheds with di ﬀ erent hydrologic and physical characteristics were considered. The novelty of the proposed approach is based on the relationships found between rainfall-runo ﬀ model parameters and watershed-physiographic factors. These relationships were successively exported and validated at the ungauged basins. From the overall results, it can be concluded that the runo ﬀ prediction in the ungauged basins was very satisfactory. Therefore, the proposed approach can be adopted as an alternative method for runo ﬀ prediction in ungauged basins characterized by highly variable-climatic conditions.


Introduction
Runoff simulations have a significant influence on sustainable water resources management and engineering design all over the world. Hydrologic/hydraulic models represent the most used tool to solve this problem. Usually, these models are calibrated and validated to get the best set of parameters that optimizes the agreement between observations and simulations. However, this classical method cannot be tackled if the study area lacks observed runoff data [1].
Despite continuous expensive research and efforts to gather hydrologic data, there are still some areas of the world with scarce hydrometric gauging stations [2,3]. These areas have been representing a motivating and challenging topic for researchers and decision-makers. To address this problem, the International Association of Hydrological Sciences (IAHS) established the "Decade on Predictions in Ungauged Basins (PUB): 2003-2012" aiming to achieve significant advances in the capacity to make predictions in ungauged basins [4]. The primary goal of the international PUB initiative was to reduce the uncertainty in the runoff prediction by shifting away from tools that require calibration and curve fitting to tools that need little or no calibration [5,6]. The review study of Hrachowitz et al. [7] explained that regionalization is the most common method used to tackle the PUB issue. Three of the most commonly adopted methods for regionalization are based on spatial proximity, physical similarity, and regression [8][9][10][11][12][13][14][15]. The spatial proximity approach estimates the model parameters at ungauged watersheds using an interpolation technique. This method assumes that the nearby basins are located in a homogeneous region [8]. The physical similarity approach uses watershed attributes to similar group catchments. This method identifies a donor basin that is most similar to an ungauged area, taking into account its watershed attributes and then transfers a complete parameter set from the donor to the corresponding ungauged catchment (acceptor) for hydrologic modeling [16]. The regression technique considers watershed attributes as independent variables and model parameters as dependent variables [17]. Several studies have compared the performance of the three-regionalization approaches in different watersheds [18][19][20]. Parajka et al. used the semi-distributed Hydrologiska Byråns Vattenbalansavdelning (HBV) model in Austria and showed that the physical similarity method performs better than the regression and spatial proximity ones [8]. Oudin et al. compared the three-regionalization techniques for runoff simulations in ungauged catchments considering 913 gauged catchments in France. They found that spatial proximity slightly outperformed physical similarity in the areas characterized by a dense stream-gauge network, while their performance becomes similar when the density of the stream-gauging network decreases [9]. Yang et al. reported that the physical similarity methods showed the best outcomes followed by spatial proximity approaches; regression methods produce the worst results, based on more than 100 catchments in Norway [21]. Kay et al. compared the performance of regression and physical similarity methods with two models: probability distributed model (PDM) and time-area topographic extension (TATE). They applied this methodology for 119 watersheds across the UK but did not obtain consistent results. For the PDM, the physical similarity was more accurate, but regression outperformed physical similarity for TATE [22]. Based on the previous studies, it seems that the physical similarity approach can be considered slightly more robust than the other approaches. However, scientific results are not always consistent all over the world.
Catchment characteristics (e.g., basin surface, soil type, topography, and land use) and meteorological data (e.g., precipitation, air temperature, solar radiation, relative humidity, and wind speed) are commonly adopted as attributes to represent the physical similarity [12,[23][24][25][26][27][28][29]. The classical physical similarity technique assumes that the similarity in the input attribute is fully transferred to the output (i.e., the runoff). The question arises as to whether it is reasonable to draw such generalized conclusions in terms of hydrologic simulations for ungauged basins within a sub-tropical large area (country scale) characterized by highly variable-climatic conditions. In cases like these, the relationship between input and output variables in the hydrologic process can be different between gauged and ungauged watersheds. This issue motivated us to find a better approach that can satisfactorily represent the hydrologic similarity of gauged and ungauged basins at country scale under highly variable-weather conditions. For the sake of clarity, we define hydrologic similarity as a holistic concept that refers to the basin's response to the combination of precipitation and catchment characteristics.
On the basis of these considerations, the main objective of this work is developing an improved regionalization method, based on the physical similarity approach, which can be used in countries characterized by variations in climate. The classical regionalization method, that simply transfers a full set of model parameters from donors to acceptors, is improved here by finding relationships between physical catchment characteristics and model parameters. Furthermore, in this work, the conventional regionalization approach is aided by a clustering method that identifies homogeneous groups of watersheds, assuring that basins with different hydrologic and physical characteristics are taken into account.
This new regionalization approach was implemented in Uruguay (South America), whose climate is highly influenced by El Niño Southern Oscillation (ENSO), a quasi-periodic interaction between the atmosphere and the tropical Pacific Ocean [30,31].

Study Area
The climate in Uruguay (South America), in particular, precipitation and river flow regime, is highly influenced by El Niño Southern Oscillation (ENSO) phenomenon [32]. It is a quasi-periodic interaction between the atmosphere and the tropical Pacific Ocean, which is of vital importance to the global hydroclimate [31]. Several studies have documented this relationship. Pisciottano et al. used long records of data provided by a dense network of rainfall stations to investigate precipitation anomalies during the extreme phases of ENSO [33]. They demonstrated that rainfall during the period November-January tends to be above the median in El Niño years, as well as from March through July of the following years. Diaz et al. show that rainfall in Uruguayan basins has significant spatial and temporal variability [34].
The study area covers 13 watersheds across Uruguay, having areas ranging from 300 to 5000 km 2 spread throughout northern and southern regions ( Figure 1). These basins were selected based on three main criteria: (i) data quality; (ii) spatial distribution; (iii) rainfall-runoff model hypothesis ( Table 1). The annual mean precipitation is 1250-1450 mm in the northern region and 1050-1250 mm in the southern part. The average air temperature varies between 2 and 3 • C in northern and southern areas. In particular, in the northern regions, the average air temperature ranges approximately between 25 • C (in January) and 13 • C (in July); and in the southern areas, it ranges between 22 • C (in January) and 11 • C (in July). The relief varies between 0 m and 500 m above mean sea level (m.s.l.) in the entire country. Natural watersheds both in the southern and in the northern region are mostly covered with natural grassland (freely downloadable from the National Environmental Observatory of Uruguay (OAN) [35]).

Study Area
The climate in Uruguay (South America), in particular, precipitation and river flow regime, is highly influenced by El Niño Southern Oscillation (ENSO) phenomenon [32]. It is a quasi-periodic interaction between the atmosphere and the tropical Pacific Ocean, which is of vital importance to the global hydroclimate [31]. Several studies have documented this relationship. Pisciottano et al. used long records of data provided by a dense network of rainfall stations to investigate precipitation anomalies during the extreme phases of ENSO [33]. They demonstrated that rainfall during the period November-January tends to be above the median in El Niño years, as well as from March through July of the following years. Diaz et al. show that rainfall in Uruguayan basins has significant spatial and temporal variability [34].
The study area covers 13 watersheds across Uruguay, having areas ranging from 300 to 5000 km 2 spread throughout northern and southern regions ( Figure 1). These basins were selected based on three main criteria: (i) data quality; (ii) spatial distribution; (iii) rainfall-runoff model hypothesis ( Table 1). The annual mean precipitation is 1250-1450 mm in the northern region and 1050-1250 mm in the southern part. The average air temperature varies between 2 and 3 °C in northern and southern areas. In particular, in the northern regions, the average air temperature ranges approximately between 25 °C (in January) and 13 °C (in July); and in the southern areas, it ranges between 22 °C (in January) and 11 °C (in July). The relief varies between 0 m and 500 m above mean sea level (m.s.l.) in the entire country. Natural watersheds both in the southern and in the northern region are mostly covered with natural grassland (freely downloadable from the National Environmental Observatory of Uruguay (OAN) [35]).

Data Quality
Quality and availability of streamflow observations. Simultaneous precipitation and flow data.

Spatial Distribution
Watersheds distributed all over the country. Elevation covers the entire national range. Precipitation and temperature cover the entire national ranges. Different soil characteristics (dominant) considered.

Model Hypothesis
Adequate surface to the model-unit hydrograph.
No presence of reservoirs.
In Table 2, the catchment and hydrologic characteristics of the selected watersheds are reported. Among the catchment characteristics, basin area (km 2 ), slope (%), concentration-time (Tc) (h), and available water content (AD) (mm) were considered. To represent the hydrologic signature of the watersheds, the Richards-Baker Flashiness Index (R-B index) and the dominant discharge (Dominant Q) were calculated. The R-B index measures the absolute day-to-day fluctuations inflow relative to total discharge and provides a useful characterization of the streamflow variability. This index can be used to classify a stream as stable or flashy, and it is defined in Equation (1) [36].
where Q i is the mean daily flow on a given day i, and n is the number of days of recorded data at a given gauging station. To better classify streamflow, Baker et al. coupled R-B index with the surface watershed [36]. In particular, they represented 27-year average index values by box plots for six ranges of catchment size. The box plots divide the range of R-B values into quartiles, with the first quartile having the most stable streams and the fourth quartile having the flashiest ( Figure 2). The watersheds selected for this work present an R-B index that ranges between 0.32 and 0.78 and their surface ranges between 366 km 2 and 4679 km 2 . Based on the box plots shown in Figure 2, they are all located in the upper-middle/highest quartile, and therefore, they are classified as flashy. The Dominant Q is a geomorphological concept and not strictly a measurable parameter. It represents a balance between the flows and sediments that shape the channel, the processes by which channel form is changed, and the ability of the boundary materials to resist change [37]. In this study, it was calculated by selecting a discharge based on a specified recurrence-interval discharge, typically between the 1 and 3-year events in the annual maximum series (2 years was chosen in this study). The annual series of maximum flow was adjusted to the generalized extreme value (GEV) distribution.

Precipitation, Climate, and Streamflow
Daily precipitation data (for the period 2001-2010) from 59 stations belonging to the National Institute of Meteorology (INUMET) and National Administration of Power Plants and Power Transmissions (UTE) was used in this study. The data quality was studied and analyzed, evaluating the accumulated annual rainfall (double mass curve) and the number of days with missing data. Furthermore, the records of 5 meteorological stations belonging to the National Institute of Agricultural Research (INIA) were taken into account. The latter, apart from the precipitation, provided climate data, including average air temperature (°C), average relative moisture (%), average wind speed (knots), and solar radiation (h). With this information, it was possible to calculate the potential evapotranspiration (PET) using the Penman-Monteith method.
Daily streamflow data (for the period 2001-2010) from 13 hydrometric stations belonging to the Uruguayan National Water Board (DINAGUA) was used. To ensure the quality of the data collected, a consistency analysis of the water balance was carried out, taking into account precipitation, evapotranspiration, streamflow, and the calculation of the runoff coefficient.
In Figure 3, the location of the selected pluviometric stations is shown with blue dots, the 5 meteorological stations are represented with pink squares, while the hydrometric stations are symbolized with red triangles. The Dominant Q is a geomorphological concept and not strictly a measurable parameter. It represents a balance between the flows and sediments that shape the channel, the processes by which channel form is changed, and the ability of the boundary materials to resist change [37]. In this study, it was calculated by selecting a discharge based on a specified recurrence-interval discharge, typically between the 1 and 3-year events in the annual maximum series (2 years was chosen in this study). The annual series of maximum flow was adjusted to the generalized extreme value (GEV) distribution.

Precipitation, Climate, and Streamflow
Daily precipitation data (for the period 2001-2010) from 59 stations belonging to the National Institute of Meteorology (INUMET) and National Administration of Power Plants and Power Transmissions (UTE) was used in this study. The data quality was studied and analyzed, evaluating the accumulated annual rainfall (double mass curve) and the number of days with missing data. Furthermore, the records of 5 meteorological stations belonging to the National Institute of Agricultural Research (INIA) were taken into account. The latter, apart from the precipitation, provided climate data, including average air temperature ( • C), average relative moisture (%), average wind speed (knots), and solar radiation (h). With this information, it was possible to calculate the potential evapotranspiration (PET) using the Penman-Monteith method.
Daily streamflow data (for the period 2001-2010) from 13 hydrometric stations belonging to the Uruguayan National Water Board (DINAGUA) was used. To ensure the quality of the data collected, a consistency analysis of the water balance was carried out, taking into account precipitation, evapotranspiration, streamflow, and the calculation of the runoff coefficient.
In Figure 3, the location of the selected pluviometric stations is shown with blue dots, the 5 meteorological stations are represented with pink squares, while the hydrometric stations are symbolized with red triangles.

Soil Type
The information about the soil types with a scale 1:40,000 was obtained from the General Directorate of Natural Resources of the Ministry of Livestock, Agriculture, and Fisheries (RENARE-MGAP). This map is known as CONEAT soil map, where CONEAT stands for the National Commission for the Agro-economic Study of the Earth (COmisión Nacional de Estudio Agro económico de la Tierra), that represents the name of the commission which developed the index CONEAT. This index characterizes the average productivity of the soils contained within one property. The Uruguayan soil has been categorized into 188 different types, each category with an associated productivity index. Among other information, this map provides the amount of available water content (AD) for each soil type. AD was calculated considering the thickness of soil horizons, the percentage of sand, silt, clay, and organic matter presented in each soil unit [38][39][40] (see Supplementary Materials (SM-1) for more information about the calculation of AD).

Rainfall-Runoff Model
The GR4J model (which stands for modèle du Génie Rural à 4 paramètres Journalier), a daily lumped four-parameter rainfall-runoff model, was used in this work. The GR4J model by Perrin et al. [41] is the modified version of the GR3J model initially proposed by Edijatno and Michel [42] and then successively improved by Nascimento and Edijatno et al. [43,44].
A well-known conceptualization of the model structure is shown in Figure 4, and the parameter description is reported in Table 3.

Soil Type
The information about the soil types with a scale 1:40,000 was obtained from the General Directorate of Natural Resources of the Ministry of Livestock, Agriculture, and Fisheries (RENARE-MGAP). This map is known as CONEAT soil map, where CONEAT stands for the National Commission for the Agro-economic Study of the Earth (COmisión Nacional de Estudio Agro económico de la Tierra), that represents the name of the commission which developed the index CONEAT. This index characterizes the average productivity of the soils contained within one property. The Uruguayan soil has been categorized into 188 different types, each category with an associated productivity index. Among other information, this map provides the amount of available water content (AD) for each soil type. AD was calculated considering the thickness of soil horizons, the percentage of sand, silt, clay, and organic matter presented in each soil unit [38][39][40] (see Supplementary Materials (SM-1) for more information about the calculation of AD).

Rainfall-Runoff Model
The GR4J model (which stands for modèle du Génie Rural à 4 paramètres Journalier), a daily lumped four-parameter rainfall-runoff model, was used in this work. The GR4J model by Perrin et al. [41] is the modified version of the GR3J model initially proposed by Edijatno and Michel [42] and then successively improved by Nascimento and Edijatno et al. [43,44].
A well-known conceptualization of the model structure is shown in Figure 4, and the parameter description is reported in Table 3.   Maximum capacity of the production store mm x2 Groundwater exchange coefficient mm x3 Capacity of the nonlinear routing store mm x4 Unit-hydrograph time base day The model belongs to the family of conceptual lumped models. It is characterized by two reservoirs or stores: a production store, which computes effective rainfall, and a routing store combined with a unit hydrograph for water transfer. The input model data are precipitation (P) and potential evapotranspiration (PE), and they are used to calculate streamflow. This calculation is made in three steps: first, the effective rainfall is calculated with a zero capacity interception store and a soil moisture accounting store. Second, this effective rainfall is split into two components: one routes 90% of effective rainfall with a unit hydrograph UH1 and a nonlinear routing store; the other routes the remaining 10% with a second unit hydrograph UH2. Third, an inter-catchment groundwater flow function is computed to account for water gains or losses stemming from interactions with neighboring catchments or groundwater and is added to both components. The final discharge at the catchment outlet is calculated as the sum of both flow components.
The GR4J model parameters were calibrated on donor catchments and subsequently transferred to acceptors catchments. To test the performance of the rainfall-runoff model, the coefficient of determination (R 2 ) (Equation (2)), Nash-Sutcliffe efficiency (NSE) (Equation (3)), and index of agreement (d) (Equation (4)) were used as metrics:  Table 3. Description of the GR4J-model parameters.

Parameter
Description Unit x 1 Maximum capacity of the production store mm Groundwater exchange coefficient mm Capacity of the nonlinear routing store mm x 4 Unit-hydrograph time base day The model belongs to the family of conceptual lumped models. It is characterized by two reservoirs or stores: a production store, which computes effective rainfall, and a routing store combined with a unit hydrograph for water transfer. The input model data are precipitation (P) and potential evapotranspiration (PE), and they are used to calculate streamflow. This calculation is made in three steps: first, the effective rainfall is calculated with a zero capacity interception store and a soil moisture accounting store. Second, this effective rainfall is split into two components: one routes 90% of effective rainfall with a unit hydrograph UH1 and a nonlinear routing store; the other routes the remaining 10% with a second unit hydrograph UH2. Third, an inter-catchment groundwater flow function is computed to account for water gains or losses stemming from interactions with neighboring catchments or groundwater and is added to both components. The final discharge at the catchment outlet is calculated as the sum of both flow components.
The GR4J model parameters were calibrated on donor catchments and subsequently transferred to acceptors catchments. To test the performance of the rainfall-runoff model, the coefficient of determination (R 2 ) (Equation (2)), Nash-Sutcliffe efficiency (NSE) (Equation (3)), and index of agreement (d) (Equation (4)) were used as metrics: Water 2020, 12, 528 where x obs (t) is the observed value at the time step t, x sim (t) is the simulated value, being (µ sim , σ sim ) and (µ obs , σ obs ) the first two statistical moments (means and standard deviations) of x sim and x obs , respectively, over the entire simulation period of length N.

Cluster Analysis for Watershed Selection
Conventional multivariate data-analysis techniques, principal component analysis (PCA) and cluster analysis (CA) (k-means), were used in this study to assure that watersheds with different hydrologic signature, and therefore with different physical and climate characteristics, were considered. Furthermore, the clustering method helps in reducing the computation time of the cross-validation.
The goal of the clustering algorithm is to partition a complex dataset into homogeneous clusters, such that the between-group similarities are smaller than the within-group similarities [45,46]. These clusters can reveal patterns related to the phenomenon under study. A distance function measures the similarity between observations. Firstly, the similarity is calculated between the data points. Once the data points begin to be grouped into clusters, the similarity is computed between the groups as well. Several metrics could be used to calculate it, and the choice of the similarity measure could influence the results [47]. This technique also requires a prior choice of the number of groups (k) that is arbitrary. In this study, the silhouette method was used to calculate the optimal k [48]. This analysis measures how close each point in a cluster is to the points in its neighboring clusters: where a(i) is the average distance of the point (i) to all the other points in the cluster it is assigned (A), b(i) is the average distance of the point (i) to other points to its closest neighboring cluster (B). Silhouette values lie in the range of [−1, 1]; the higher the value, the better is the cluster configuration. In particular, the value 1 is ideal since it indicates that the sample is far away from its neighboring cluster and very close to the group it is assigned. Similarly, the value −1 is the least preferred since it indicates that the point is closer to its neighboring cluster than to the cluster it is assigned.

Regionalization Approach
The new study approach is summarized in 7 phases as follows ( Figure 5). Phase 1: Data collection. The required input data were adequately collected: (i) physical catchment characteristics, e.g., DEM and soil type; and (ii) climatic/hydrologic characteristics, e.g., flow, precipitation, and climate.
Phase 2: Hydrologic response. The runoff signature of the studied watersheds was evaluated by calculating the R-B index and dominant discharge.
Phase 3: Model implementation. The rainfall-runoff model was implemented at the n catchments by exploiting the input data collected during Phase 1.
Phase 4: Individual model calibration. Model parameters were calibrated at each watershed. Model performance was assessed by comparing the predicted and observed runoff at donor catchments. With this aim, Nash-Sutcliffe efficiency (NSE individual ) was used as an objective function, coefficient of determination (R 2 ) and index of agreement (d) as performance indicators.
Phase 5: Watershed classification. To assure the heterogeneity of the watersheds considered in terms of hydrologic signature, the catchments were clustered into k different groups according to their hydrologic and physical similarity (calculated in Phase 2). This phase will also reduce the computation time of the cross-validation phase.
Phase 6a: Leave-one-out cross-validation (calibration). The watersheds were divided into n-k donors and k acceptors.
Phase 7: Enhanced regionalization (Regional model calibration). This step is performed inside the cross-validation phase. Functional relationships between model parameters and physical-catchment attributes were established (f 1 (x 1 ), f 4 (x 4 )). By considering f 1 and f 4 instead of x 1 and x 4 , a regional model calibration was carried out (NSE regional was used as a performance indicator) to find the pair {x 2 , NSE individual − NSE regional , where n-k represents the number of donor catchments.
Phase 6b: Leave-one-out cross-validation (validation). The performance of the proposed regionalization approach was evaluated (NSE regional ). The runoff output of the model with the regionalized and optimal calibrated parameters at each acceptor watershed was implemented at the k donors.
For the sake of clarity, it is worth mentioning that, in Figure 5, the input/output is represented with a circle, the output of the whole approach with a double-circle, and the processes with a rectangle.
Water 2020, 12, x FOR PEER REVIEW 9 of 21 calibration was carried out (NSEregional was used as a performance indicator) to find the pair {x2, x3} that minimize ∑ − , where n-k represents the number of donor catchments.
Phase 6b: Leave-one-out cross-validation (validation). The performance of the proposed regionalization approach was evaluated (NSEregional). The runoff output of the model with the regionalized and optimal calibrated parameters at each acceptor watershed was implemented at the k donors.
For the sake of clarity, it is worth mentioning that, in Figure 5, the input/output is represented with a circle, the output of the whole approach with a double-circle, and the processes with a rectangle.

Rainfall-Runoff Model Performance
The four parameters described in Table 3 were used to calibrate the rainfall-runoff model at each watershed. The 80% confidence intervals for the four parameters and the values chosen for each of them at each basin are shown in Tables 4 and 5, respectively. The values of x 1 are lower than the median value of its range (median = 350). Four of the values are even lower than the minimum presented in Table 4. This is justified by the fact that the Uruguayan soils are not thick and are characterized by expanded clay formed by minerals that, with water, increase their volume and reduce soil-storage capacity [49]. Therefore, Uruguayan soils are characterized by a very low soil-water availability. The values of x 4 are in the upper border of its range. Two of the values are higher than the maximum presented in Table 4. This depends on the watershed surface: the higher the area, the higher the influence of small flood-plain storages within the basin, the higher x 4 .
The calibration of the rainfall-runoff model employed an iterative process by varying the parameters in Table 4. Working taking into account the established range, and comparing (graphically in Figure 6 and numerically in Table 6) the simulation with the measured hydrograph, the calibration was individually performed for each watershed until a good fit was obtained (individual model calibration). In particular, NSE was used as an objective function, and R 2 and d were used to evaluate the model performance. The period of calibration ranged between 2 and 5 years, depending on the data availability and data quality at each basin. In Figure 6   The agreement between measured and simulated hydrographs for all simulations was very satisfactory. In Figure 7, the statistical results of the individual calibration process at each selected watershed are represented. They follow the general performance ratings for adopted statistics by Moriasi et al. [50] and Chen et al. [51] reported in Table 7. The agreement between measured and simulated hydrographs for all simulations was very satisfactory. In Figure 7, the statistical results of the individual calibration process at each selected watershed are represented. They follow the general performance ratings for adopted statistics by Moriasi et al. [50] and Chen et al. [51] reported in Table 7.   Calibration results at the ten selected watersheds (the square symbolizes R 2 , the circle represents NSE, and the triangle d).

Donor and Acceptor Catchment Selection
To assure the validity of the new regionalization approach to catchments characterized by different weather conditions, prior to implementing the cross-validation and regionalization approach, the 13 watersheds have been classified on the basis of their physical and hydrologic characteristics. With this aim, principal component analysis (PCA) associated with k-means clustering was applied to watershed attributes to classify the basins into k clusters.
The silhouette method was adopted to identify the optimum number of clusters (k) to consider. The average silhouette score of all the examples in the data set was calculated (Figure 8a). This gives us one value representing the silhouette score of the entire cluster. Furthermore, the silhouette-score boxplot was represented to evaluate the dispersion of the data at each number of clusters considered. (Figure 8b). Since there were 13 watersheds, the possibility of having k = 2 to k = 10 was taken into account.
Water 2020, 12, x FOR PEER REVIEW 13 of 21

Donor and Acceptor Catchment Selection
To assure the validity of the new regionalization approach to catchments characterized by different weather conditions, prior to implementing the cross-validation and regionalization approach, the 13 watersheds have been classified on the basis of their physical and hydrologic characteristics. With this aim, principal component analysis (PCA) associated with k-means clustering was applied to watershed attributes to classify the basins into k clusters.
The silhouette method was adopted to identify the optimum number of clusters (k) to consider. The average silhouette score of all the examples in the data set was calculated (Figure 8a). This gives us one value representing the silhouette score of the entire cluster. Furthermore, the silhouette-score boxplot was represented to evaluate the dispersion of the data at each number of clusters considered. (Figure 8b). Since there were 13 watersheds, the possibility of having k = 2 to k = 10 was taken into account. From Figure 8a, it seems that the optimum number of clusters is k = 4. However, in Figure 8b, it is possible to see that the dispersion of the data related to k = 4 is higher than k = 3. Furthermore, the k = 3-average is only slightly lower than the average on k = 4, and its minimum is higher than k = 4. For these reasons, we decided to group the watersheds into 3 clusters.
The PCA/k-means analysis was coded and run in R [52], by using the libraries "cluster," "HSAUR," and "vegan." The attributes considered were watershed area (km 2 ), slope (%), concentration-time (h), and AD (mm), for considering the physical catchment characteristics; R-B index, and Dominant Q (m 3 /s/km 2 ), for taking into account the hydrologic response. A data matrix (13 × 7) was the input for this HCA, being 13 watersheds and 7 attributes (6 basin characteristics + 1 basin identification). Prior to classification, the attributes (raw data) were mean-centered and scaled to unit variance to give equal importance to each attribute and to handle the different measurement units of attributes. After the normalization, all attributes had mean equal to zero and standard deviation equal to one. The first two principal components (PCs) or dimensions (Dim) were selected since they represented 72% of the variance. In Figure 9, the resulting PCA scores plot is shown. Each of the 13 watersheds is represented by a data point in the plot, and their location indicates the score of each basin for the PCs. From Figure 8a, it seems that the optimum number of clusters is k = 4. However, in Figure 8b, it is possible to see that the dispersion of the data related to k = 4 is higher than k = 3. Furthermore, the k = 3-average is only slightly lower than the average on k = 4, and its minimum is higher than k = 4. For these reasons, we decided to group the watersheds into 3 clusters.
The PCA/k-means analysis was coded and run in R [52], by using the libraries "cluster," "HSAUR," and "vegan." The attributes considered were watershed area (km 2 ), slope (%), concentration-time (h), and AD (mm), for considering the physical catchment characteristics; R-B index, and Dominant Q (m 3 /s/km 2 ), for taking into account the hydrologic response. A data matrix (13 × 7) was the input for this HCA, being 13 watersheds and 7 attributes (6 basin characteristics + 1 basin identification). Prior to classification, the attributes (raw data) were mean-centered and scaled to unit variance to give equal importance to each attribute and to handle the different measurement units of attributes. After the normalization, all attributes had mean equal to zero and standard deviation equal to one. The first two principal components (PCs) or dimensions (Dim) were selected since they represented 72% of the variance. In Figure 9, the resulting PCA scores plot is shown. Each of the 13 watersheds is represented by a data point in the plot, and their location indicates the score of each basin for the PCs.

Calibration and Model Parameter Regionalization
To evaluate the runoff predictions in ungauged watersheds, three of the thirteen catchments, each of them belonging to the three clusters k, were left out in turn and considered as ungauged or acceptor watersheds. Therefore, the performance of this methodology is successfully validated (with leave-one-out cross-validation) by assuming each gauged catchment, belonging to each cluster, as ungauged at once. From the donor catchments, the entire set of regionalized and optimal calibrated parameters is then used to model runoff at the "ungauged" catchments. This is then compared with the streamflow measurements to finalize the validation of the proposed approach.
The regionalization phase is located inside the cross-validation process; particularly between the calibration and validation stage (Figure 1). A relationship between model parameters and physical basin characteristics was found. In particular, a relationship of this kind was established for x1 and x4. x1 was directly associated with the average of AD in the soil, which represents the maximum amount of water useful for the vegetation for the evapotranspiration process. It is calculated by subtracting field capacity and wilting point.
x4 was found to be correlated with the concentration-time (Tc) of the basins. The best approximation to the Tc and x4 was obtained using a relationship described by a power law. These curves are graphically reported in the Supplementary Materials (SM-2), and the following one has been considered as an average curve that represents all of them: The agreement among x4 -Tc data points and the regression (Equation (7)) was evaluated through the R 2 , which was equal to 0.55.
Since no relationships between x2 and x3 with physical catchment characteristics were found, we proceeded with a regional model calibration to compute the optimal values for these two parameters. For doing this, a unique set of parameters for the 10 watersheds has to be found (where 10 is obtained from n-k = 10-3). Therefore, x1 and x4 were fixed on the basis of the correlations previously mentioned, and an iterative process was employed by varying x2 and x3 working within the established ranges.

Calibration and Model Parameter Regionalization
To evaluate the runoff predictions in ungauged watersheds, three of the thirteen catchments, each of them belonging to the three clusters k, were left out in turn and considered as ungauged or acceptor watersheds. Therefore, the performance of this methodology is successfully validated (with leave-one-out cross-validation) by assuming each gauged catchment, belonging to each cluster, as ungauged at once. From the donor catchments, the entire set of regionalized and optimal calibrated parameters is then used to model runoff at the "ungauged" catchments. This is then compared with the streamflow measurements to finalize the validation of the proposed approach.
The regionalization phase is located inside the cross-validation process; particularly between the calibration and validation stage (Figure 1). A relationship between model parameters and physical basin characteristics was found. In particular, a relationship of this kind was established for x 1 and x 4 .
x 1 was directly associated with the average of AD in the soil, which represents the maximum amount of water useful for the vegetation for the evapotranspiration process. It is calculated by subtracting field capacity and wilting point.
x 4 was found to be correlated with the concentration-time (T c ) of the basins. The best approximation to the Tc and x 4 was obtained using a relationship described by a power law. These curves are graphically reported in the Supplementary Materials (SM-2), and the following one has been considered as an average curve that represents all of them: The agreement among x 4 − T c data points and the regression (Equation (7)) was evaluated through the R 2 , which was equal to 0.55.
Since no relationships between x 2 and x 3 with physical catchment characteristics were found, we proceeded with a regional model calibration to compute the optimal values for these two parameters.
For doing this, a unique set of parameters for the 10 watersheds has to be found (where 10 is obtained from n − k = 10 − 3). Therefore, x 1 and x 4 were fixed on the basis of the correlations previously mentioned, and an iterative process was employed by varying x 2 and x 3 working within the established ranges. The regional calibration was performed until a good fit was obtained, i.e., when the difference between the NSE obtained from the individual calibration (NSE individual ), and the NSE obtained from the current calibration (NSE regional ) was minimum (Equation (8)). Among the three selected goodness-of-fit indicators (Equations (2)-(4)), NSE was chosen since it is the most restrictive one. 10 i=1 (NSE individual i − NSE regional i ) (8) In this calibration process, the 10 donor watersheds were considered. It is worth remarking that these 10 watersheds are different at any iteration of the cross-validation process. In particular, considering that three clusters were identified and each of them contains seven, two, and four elements respectively (Figure 9), 56 iterations were computed for the cross-validation process. As partial outcomes, in Table 8, the range of variation of NSE for the calibration process is presented. It is worth noting that NSE is never lower than 0.5. Therefore, all the iterations can be considered satisfactory, good, and very good, based on the classification reported in Table 7.

Cross-Validation of the Regionalization Process
The improved regionalization method was tested at the acceptor basins. The streamflow prediction was then compared to the observed streamflow time-series at the pseudo-ungauged catchments. In Figure 10, the performance obtained at the individual calibration phase and regional calibration stage were evaluated by NSE. The numerical values for this final stage (regional model calibration) are reported in Table 9. Considering that in the 56 iterations, x 2 and x 3 were varying in the ranges [−1.5, −0.5] and [47,67] respectively, it is worth remarking that the following outcomes correspond to the best values of x 2 = −1.5 and x 3 = 59. Water 2020, 12, x FOR PEER REVIEW 16 of 21  The GR4J model has obtained very good regionalized results by using the hydrologic and physical similarity approach. The NSE values vary from 0.49 (Santa Lucía Chico basin) to 0.81 (Cebollati basin), with an average equal to 0.70; the R 2 values vary from 0.67 (Santa Lucía Chico basin) to 0.94 (Tacuarembo Ch. basin), with an average equal to 0.83; the d values vary from 0.74 (Casupa basin) to 0.82 (Cebollati, Olimar, Santa Lucía, and Tacuarembo Ch. basins), with an average equal to 0.79. Overall, the agreement between measured and simulated hydrographs for all simulations was very satisfactory. From Figure 10, poorer performance was found only for San Salvador and Santa Lucía Chico watersheds for the regional calibration (NSE = 0.52 and NSE = 0.49, respectively). This can be caused by different reasons. (i) Santa Lucía Chico watershed lacks a good spatial distribution of pluviometric stations: two rain gauges located on the eastern border, two rain gauges in the north and one in the south (not belonging to the basin) were used to represent the precipitation distribution of this catchment (Figure 4). (ii) Another significant reason is represented by the low data quality in San Salvador watershed. (iii) It is essential to remark that San Salvador and Santa Lucía Chico are among the basins characterized by the lowest slope and the highest AD. Knowing that the GR4J model better simulates watersheds with a rapid hydrologic response [53][54][55], it is clear that this model is not the most suitable one for San Salvador and Santa Lucía Chico basins. This is confirmed by the  The GR4J model has obtained very good regionalized results by using the hydrologic and physical similarity approach. The NSE values vary from 0.49 (Santa Lucía Chico basin) to 0.81 (Cebollati basin), with an average equal to 0.70; the R 2 values vary from 0.67 (Santa Lucía Chico basin) to 0.94 (Tacuarembo Ch. basin), with an average equal to 0.83; the d values vary from 0.74 (Casupa basin) to 0.82 (Cebollati, Olimar, Santa Lucía, and Tacuarembo Ch. basins), with an average equal to 0.79. Overall, the agreement between measured and simulated hydrographs for all simulations was very satisfactory. From Figure 10, poorer performance was found only for San Salvador and Santa Lucía Chico watersheds for the regional calibration (NSE = 0.52 and NSE = 0.49, respectively). This can be caused by different reasons. (i) Santa Lucía Chico watershed lacks a good spatial distribution of pluviometric stations: two rain gauges located on the eastern border, two rain gauges in the north and one in the south (not belonging to the basin) were used to represent the precipitation distribution of this catchment (Figure 4). (ii) Another significant reason is represented by the low data quality in San Salvador watershed. (iii) It is essential to remark that San Salvador and Santa Lucía Chico are among the basins characterized by the lowest slope and the highest AD. Knowing that the GR4J model better simulates watersheds with a rapid hydrologic response [53][54][55], it is clear that this model is not the most suitable one for San Salvador and Santa Lucía Chico basins. This is confirmed by the fact that NSE values obtained from the individual calibration (Stage 1) were already the lowest among the 13.
In Figure 11, the graphical comparison between observed and simulated hydrographs at three acceptor watersheds is shown. For this representation, the closest basins to the centroid of each cluster were considered. For the sake of completeness, the range of variation of NSE in the last stage of the cross-validation process (validation), and the graphical comparison between observed and simulated flow-duration curve (FDC) are reported in the Supplementary Materials (SM-3 and SM-4 respectively).
Water 2020, 12, x FOR PEER REVIEW 17 of 21 fact that NSE values obtained from the individual calibration (Stage 1) were already the lowest among the 13. In Figure 11, the graphical comparison between observed and simulated hydrographs at three acceptor watersheds is shown. For this representation, the closest basins to the centroid of each cluster were considered. For the sake of completeness, the range of variation of NSE in the last stage of the cross-validation process (validation), and the graphical comparison between observed and simulated flow-duration curve (FDC) are reported in the Supplementary Materials (SM-3 and SM-4 respectively). It is important to examine if the performance of the regionalization process changes with an individual attribute in catchments. A correlation matrix was analyzed, considering the regionalized values of the goodness-of-fit indicators and the physical and hydrologic catchment characteristics (Table 10). It is important to examine if the performance of the regionalization process changes with an individual attribute in catchments. A correlation matrix was analyzed, considering the regionalized values of the goodness-of-fit indicators and the physical and hydrologic catchment characteristics (Table 10). Table 10. Correlation matrix of basin characteristics and regionalized goodness-of-fit indicators. There exists a strong direct relationship between the regionalized R 2 , NSE, and catchment slope. The regionalized model performance tends to increase with an increase in slope. However, a decreasing trend is shown between AD and the regionalized NSE. This suggests that regionalization results tend to become better in steep-slope watersheds characterized by a low amount of available water content in their soils; e.g., watersheds with a high-runoff potentiality (flashy). The latter is justified from the fact that if Slope decreases and/or AD increases, considering a hypothetical output hydrograph, the superficial runoff contribution is reduced in favor of the contribution given by the groundwater flow (base flow + slow sub-surface flow). This reality cannot be well depicted in the GR4J model since it better simulates watersheds with a rapid hydrologic response. This result is generally in accordance with the regionalization results found in other regions, such as Australia, Asia, and Europe [53][54][55].

Summary and Conclusions
In this study, we proposed an improved regionalization approach that is valid at country scale and for those regions characterized by highly variable-climatic conditions. It was successfully applied to several gauged and pseudo-ungauged basins in Uruguay, a country whose climate is influenced by the ENSO.
A daily rainfall-runoff model, GR4J, was implemented and successfully calibrated at 13 watersheds (individual calibration) (NSE average = 0.80; R 2 average = 0.84; d average = 0.79). To assure the validity of the new regionalization approach to catchments characterized by different weather conditions, prior to implementing the regionalization approach and the cross-validation process, the 13 watersheds were classified on the basis of their physical and hydrologic characteristics with the aid of PCA/k-means techniques.
After finding the optimal set of parameters of the hydrologic model at each basin, the proposed approach does not just transfer it to a similar ungauged watershed, but it finds robust relationships between x 1 and AD (called f 1 ) and between x 4 and T c (called f 4 ). These correlations were then exploited for a regional model calibration that allowed to generate a calibrated set of parameters that was transferred to the basin acceptors for cross-validating the model and testing the accuracy and reliability of this new regionalization approach. Model performance was very satisfactory (NSE average = 0.70; R 2 average = 0.83; d average = 0.79). It was found that the regionalized R 2 and NSE are directly correlated with catchment slope and inversely correlated with AD and surface basin, among the other variables. This suggests that regionalization results tend to become better in medium/small and steep-slope watersheds characterized by a low amount of available water content in their soils, e.g., watersheds with a high-runoff potentiality.
This paper not only provides an enhanced regionalization approach to predict runoff time series in ungauged catchments at country scale under highly variable-weather conditions, but also provides guidelines for a proper implementation of the GR4J model.