Regional Parameter Estimation of the SWAT Model: Methodology and Application to River Basins in the Peruvian Paciﬁc Drainage

: This study presents a methodology for the regional parameters estimation of the SWAT (Soil and Water Assessment Tool) model, with the objective of estimating daily ﬂow series in the Paciﬁc drainage under the context of limited hydrological data availability. This methodology has been designed to obtain the model parameters from a limited number of basins (14) to ﬁnally regionalize them to basins without hydrological data based on physical-climatic characteristics. In addition, the bootstrapping method was selected to estimate the uncertainty associated with the parameters set selection in the regionalization process. In general, the regionalized parameters reduce the initial underestimation which is reﬂected in a better quantiﬁcation of daily ﬂows, and improve the low ﬂows performance. Furthermore, the results show that the SWAT model correctly represents the water balance and seasonality of the hydrological cycle main components. However, the model does not correctly quantify the high ﬂows rates during wet periods. These ﬁndings provide supporting information for studies of water balance and water management on the Peruvian Paciﬁc drainage. The approach and methods developed can be replicated in any other region of Peru.


Introduction
The basins that drain into the Pacific Ocean of Peru are characterized by small basins with bare and steep slopes that favor erosion and flooding during intense rainfall events. Rainfall is more abundant along the north coast and decreases towards the south, where conditions are extremely arid [1]. The Pacific drainage represents 22% of the Peruvian territory [2], where more than 50% of the Peruvian population is established and has 2% of all the fresh water available in Peru [3], generating frequent conflicts between multiple water users regarding its allocation and accessibility. Currently, 73% of the population lives in urban environments, and it is expected that it (as well as their living standards) will increase by 2050 to approximately 40 million [4]. Therefore, domestic and agricultural water use is likely to increase rapidly, even more so in a country where much of the current agricultural production depends on irrigation and consumes approximately 85% of surface water [5]. Despite the importance of knowing water availability in Peru, previous studies [6][7][8] have shown evidence of the limited hydrological data availability in the Pacific drainage (Pd).
The evaluation of the water availability of a basin is generally performed through hydrological models. Rainfall-runoff modeling is an important area of research for the global scientific community that addresses the hydrology. A simple reason for this is that quantitative or qualitative flow information is vital for many practical applications, such as water allocation, long-term planning, watershed management operations, flood forecasting, optimization of the hydroelectric power production, and the hydraulic structures design. Although hydrological models can provide information on the rainfall-runoff mechanism, they remain abstractions of a real system, and it cannot be assumed that any of them generate accurate information for specific hydrographic basins and hydrological conditions [9]. Hydrological modeling requires calibration, which to a certain extent offers a reliability of the the measured flow rates simulations, especially in the context of (1) lack of hydrological process understanding, (2) possibly too simplistic representations of the process (abstractions), (3) the spatio-temporal discretization of highly heterogeneous rainfall-runoff processes, and (4) the impossibility of measuring all the model parameters required in the application scale [10]. In addition, in regions with little measured hydrological data, such as Peru, the set of parameters transfer of a calibrated model from one or more basins to a basin without hydrological data is a valid tool to quantify the hydrological processes that occur within them. This process of transfer is known as regionalization [11].
Flow regionalization is a challenging task in the hydrology science [12][13][14]. First, due to the absence of flow data required for the model calibration, and second, studies evaluating regionalization methods usually produce different results [12]. Despite this research momentum, there is still no single accepted approach to predict flows in regions with no hydrological data [15,16]. A large number of hydrological models regionalization methods have been proposed in the literature. All of them are based on the concept that information on the hydrological response can be transferred between hydrographic basins that can be assumed to be hydrologically similar, generally based on the knowledge of their relevant physical properties. The differences between the approaches are found in the type of information that is transferred, the method used and the watershed properties used to quantify the similarity. There are at least three common regionalization approaches: the regression-based approach, the spatial proximity approach, and the physical similarity approach. Comparative studies have been conducted in small and large data sets, in multiple hydrological models and with many variants [12,[17][18][19][20][21]. The authors of [16] conducted an exhaustive review of regionalization studies during the last decade. The most notable studies have had some divergent results, which are part of the reason why more conclusive evidence is required [12,22]. Most studies use simple hydrological models with few parameters to preserve a high level of independence and a good correlation between the parameters and the basins descriptors. For hydrological models with small parameter spaces and low parameter interdependence, equifinality is often not a problem. However, if the hydrological model has the opposite attributes, many optimal parameter sets can be found during calibration. The current modeling philosophy requires that the models be described transparently and that the uncertainty analysis be performed routinely as part of the modeling work. This analysis is essential to evaluate the calibrated model strength [23,24].
The SWAT (Soil and Water Assessment Tools) model [25] has shown its strengths in the aspects detailed above. It has been used both to quantify uncertainty and evaluate regionalization methods [24,[26][27][28][29][30]. In addition, the SWAT model is open source with a large and growing number of applications in hydrological modeling applied in studies ranging from basin scale to continental scales. There are few published studies related to SWAT in South America for flow estimation purposes [31][32][33][34][35]. In Peru, its use is not yet widespread [36,37], although hydrological models ranging from aggregated to semidistributed have been used in recent years. The references show that the models in Peru have been developed mainly to evaluate the impact of climate change on hydrology [38][39][40][41], the evaluation of satellite rainfall products for hydrology purposes [42][43][44][45][46] and there is only one study that performs a parameters regionalization at the scale of the Pd using a lumped hydrological model [8]. During recent years, some hydrological studies have been developed on the Peruvian Pd: [47] documented and analyzed the previous hydrological and physical conditions throughout the study area (54 hydrographic basins) from the 1920 to the 1960s. The authors of [3] evaluated the water supply and demand in the main basins where water management is prioritized. They estimated the total annual volume of fresh water available throughout the Pd from the 1970s to 2010. The authors of [6] analyzed mean conditions and flow variability from 1969 to 2004. The authors of [48] identified the annual runoff for some hydrographic basins with low water balance disparities or with quasi-natural conditions on an interannual scale from 1970 to 2008. Finally, [8] quantified and analyzed the flow variability and seasonality in natural regime in 49 Pd basins by applying a parameters regionalization methodology from conceptual models. There have been numerous efforts to evaluate and quantify the water resources in the Pd during recent years; however, there are still no regional studies with the purpose of estimating natural series of daily flows.
In this context, the objective of this work is to use SWAT to build a hydrological model for the Pd at subbasin level and at a daily time step. The key objectives of this work are (1) to present for the first time a regionalization parameters methodology based on a physical basis model, (2) to estimate water availability in the period 1981-2016 under a context of data scarcity, and (3) to quantify the uncertainty associated with the parameter regionalization process. In particular, components such as blue water (water yield plus deep aquifer recharge), green water flow (real evapotranspiration) and green water storage (soil moisture) are quantified. The estimates of the model at subbasin level are added to basin levels to compare it with previous studies and to support the obtained results. This article is organized as follows: Section 3 briefly describes the structure of the SWAT model and explains the proposed methodology. The calibration results, the estimated regional flow statistical performance and the uncertainty analysis are discussed in Sections 4 and 5. Finally, Section 6 provides the summary and conclusions of this study.

Study Area
The Pacific drainage (Pd) has an approximate area of 278,482 km 2 . The study region includes 52 main hydrographic basins with altitudinal variations ranging from 0 to approximately 6500 masl ( Figure 1). These basins have bare and steep slopes and generally drain west from the high Andes to the Pacific Ocean. In addition, during heavy rainfall events, a high potential for increased maximum flows, floods and erosion prevails in the Pd [6,48].
Under normal conditions, this region is influenced by the South Pacific Anticyclone in combination with the Humboldt Current, which produces dry and stable conditions in the western central Andes. Additionally, this region exhibits greater seasonal and interannual rainfall variability than the other two main hydrological regions of Peru: the Titicaca and Amazon drainages [6], mainly caused by the influence of the El Niño Southern Oscillation (ENSO). ENSO generates impacts on the regional climate worldwide, particularly on the Pacific coast, and these impacts are reflected in the strong El Niño years, which has a direct influence on rainfall increase (decrease) in the northern (southern) zone [7,[49][50][51], also causing large economic losses [52].
The study area presents arid and semiarid conditions and, therefore, is prone to threats of water scarcity for different sectors. The water demand for economical activities (agriculture, mining, industry and livestock) and domestic use represent approximately 87% of the total national consumption. Only agriculture represents the greatest consumptive use (86%), whose availability depends mainly on the irrigation systems located in the basins valleys. In addition to the threat of water scarcity, Pd is prone to devastating floods [3].

Materials and Methods
The general scheme of the proposed methodology is shown in Figure 2. The first part consisted of the soil gridded data recompilation and climate data. Then, the hydrological modeling process was carried out through a parameter sensitivity analysis process and calibration, while the hydrological similarity was defined through a physical similarity approach. Finally, the parameter transfer process determined the regional parameters used to estimate the series of daily flow in the entire Pd. More detail on the explained processes is shown below.

Hydrological Modelling Platform
Among the semi-distributed hydrological models, the SWAT model has been developed for the runoff estimation in areas without hydrological measurements [25,53]. SWAT is a physically based model that operates on a continuous (daily) time scale. In addition, it subdivides a basin into subbasins, which are connected by a flow network. The subbasins can be divided into hydrological response units (HRUs), which are unique combinations of land use, soil type and slope. The hydrological simulation is based on the water balance equation and is divided into two main components: the terrestrial phase, which simulates the amount of water, sediments, nutrients and pesticide loads in the main channel of each subbasin, and the routing phase, which simulates the movement of water, sediments and nutrients through the channel network of the basin to the outlet [54].
The climatic variables of entry to the SWAT consist of rainfall, maximum and minimum air temperature, solar radiation, wind speed and relative humidity. Optionally, the operations of reservoir management, lakes, transfers and crop management can also be considered in the model scheme. Further details of the SWAT can be found in the theoretical documentation (http://swatmodel.tamu. edu) and in [25].

Model Input
The model was configured based on freely downloadable data. In the case of geographic data, ref. [55,56] have shown that the results of the SWAT model are affected by the spatial resolution of the aforementioned data. Therefore, the resolutions considered below were selected based on the data availability for the Pd and the size of subbasins to be delimited (as discussed later in this section).

•
The rainfall and temperature data were obtained from the product PISCO (Peruvian Interpolated data of Senamhi's Climatological and Hydrological Observations). The daily rainfall database corresponded to version 2.1 [57], while the daily temperature database (maximum and minimum) was version 1.1 [58].

Model Setup
The model was implemented in SWAT 2012 using the QSWAT interface [62]. For the subbasins delimitation, two criteria were taken: (1) the burn-in option was used from predefined rivers to improve the delimitation in flat areas [63], and (2) an area threshold was established to be 200 km 2 . In addition, the HRUs were defined based on land use, soil type and dominant slope. This configuration was considered appropriate according to the computational resources available for hydrological modeling at the drainage scale.
The potential evapotranspiration was estimated by the Hargreaves-Samani method [64] due to the limited and scarce availability of observed data on relative humidity, wind speed and solar radiation. In addition, this method has shown good results comparable with Priestley-Taylor [65] and Penman-Monteith [66] (methods also available in SWAT) in semiarid zones [67]. The model did not include data on reservoirs operation, lakes or any other hydraulic influence.

Model Calibration
The SWAT input parameters are based on real processes and must be kept within a range of realistic uncertainty, so the first step in the calibration process was determinate the most sensitive parameters of the hydrological model [68]. In this way, from a sensitivity analysis for each parameter (one-at-a-time), the search ranges of each parameter were identified (ranges that respected the physical meaning of each one) so that the SWAT model is able to quantify the flow contributed by surface runoff and baseflow. In addition, as recommended by [69], it was considered to calibrate the model with the fewest possible parameters.

Evaluation Metrics
The calibration process considered a warm-up period of 3 years, and the statistical performance was evaluated based on the Kling-Gupta efficiency criterion (KGE) and the logarithmic Nash index (logNSE).

•
Overall performance: KGE is a comprehensive metric, a weighted average of the Pearson product-moment correlation coefficient (r), the ratio between the mean of the simulated values and the mean of the observed ones (β), and variability ratio (γ), which is computed using the standard deviation of simulated and observed (Equation (1)). Kling-Gupta efficiencies range from -Infinity to 1. The closer to 1, the more precise the model is.
• Low flows: By taking the log of simulated (log S i ) and observed (log O i ) before calculating the NSE, the influence of (missing) peak flows is reduced and more emphasis is placed on the base flow (the criticism of the standard NSE is that it is overly sensitive to the magnitude and timing of peak flows (Equation (2)).
Maximizing the KGE and logNSE, the performance of the hydrological modeling was optimized from a multi-objective perspective using the correlation, variability and bias criteria [70] and the performance in the representation of low flow rates. These statistics have been used in previous studies, resulting in good calibration and good performance in hydrological modeling [71][72][73].
The calibration and validation of the simulated and observed flows performance was carried out in 14 basins with daily flow data availability. The calibration period indicated in Table 1 was used to obtain the calibrated parameters. Then, we validate the performance of the model in two periods: the subsequent period to the calibration period until December 2016 and the entire period 1981-2016. The goodness of fit parameters were estimated through the R package "hydroGOF" [74].

Multiobjective Calibration Algorithm
The optimal parameters were derived from the Elitist Non-dominated Sorting Genetic Algorithm II (NSGA-II) multi-objective calibration algorithm, whose application has provided excellent results in hydrological modeling using SWAT and has been shown to be more efficient than the Monte Carlo method to reduce optimized parameters uncertainty [75,76]. Unlike single objective genetic algorithms, NSGA-II assigns fitness by Pareto ranking (nondomination) and crowding distance to the combined parent and child populations. A solution (or individual) is nondominated if it performs better in at least one objective functions and as well in all the other objective functions. The individual is then ranked according to the number of solutions that dominates it. Crowding distance is the average distance between an individual and its nearest neighbors in the search space. With the objective functions as minimization problems, individuals that are dominated by fewer solutions (i.e., has a lower rank) are given a better fitness than the dominated ones. In cases where the solutions have the same nondomination rank, the individual with larger crowding distance is preferred, thus ensuring diverse and well-spread population. The new parent population is chosen from the combined parent and child population based on the individuals' fitness or rank, thus the elitist selection.
NSGA-II nondominated sorting algorithm has a computational complexity O(MN 2 ), where M is the number of objectives and N is the size of a population P. For each solution two parameters are calculated in the algorithm: (1) the domination count n i which is the number of solutions that dominate the solution i and (2) a set of solutions S i that the solution i dominates. The solutions of the first non-dominated front are identified in Steps 1 to 3 of the algorithm and the solutions in higher fronts are searched in Steps 4 to 6. The algorithm is as follows: • Step 1: For each i ∈ P, set n i = 0 and S i = Ø. • Step 2: For all j = i and j ∈ P, if i dominates j, Add j to the set of solutions dominated by i: S i = S i U j. Otherwise, increment the domination count of i: n i = n i +1.

•
Step 3: If n i = 0, keep i in the first non-dominated front P 1 and set the front counter k = 1. • Step 4: While P k = Ø, initialize Q = Ø for storing the next non-dominated solutions. • Step 5: For each i ∈ P k and for each j ∈ S i , update n j = n j -1. If n j = 0, j belongs to the next front and update Q = Q U j. • Step 6: Set k = k +1 and P k = Q, go to Step 4.
Further details of NSGA-II can be found in [77]. The calibration algorithm was implemented through the R library "nsga2R" [78], setting KGE and logNSE as objective functions.

Regionalization Using the Physical Similarity Approach
The similarity approach consists of transferring hydrological information from donor basins (with flow data) that are similar to basins without flow data based on basin descriptors. For this, the climatic and physiographic variables of [12] are detailed in Table 2.

Clustering Dataset
There are a variety of cluster analysis techniques available to group basins into groups with similar characteristics. However, not one of them has been proven to be better than the others [79]. To group the Pd subbasins into regions assumed to be hydrologically homogeneous, Ward's hierarchical classification method [80] and principal component analysis (PCA) were used. The Ward method minimizes the total internal variance of each identified cluster, while the PCA reduces the dimensionality from a large number of interrelated variables while retaining most of the variation of the data used [81]. The variables in Table 2 were estimated from the input and the output data obtained in the first run of the SWAT model (without calibration) for a total of 730 Pd subbasins. The subbasins were delimited according to the criteria mentioned in Section 3.3.
The estimated clusters or regions were validated by Dunn and Silhouette indices, which are mainly used to choose the optimal number of clusters without the need for additional external information [82][83][84]. Dunn Index represents the ratio of the smallest distance between observations not in the same cluster to the largest intra-cluster distance. If the data set contains compact and well-separated clusters, the largest distance is expected to be small and the smallest one is expected to be large. Silhouette index defines compactness based on the pairwise distances between all elements in the cluster, and separation based on pairwise distances between all points in the cluster and all points in the closest other cluster. In both cases, the index should be maximized.
This validation procedure was carried out using the R library "clValid" [85].

Parameter Transfer Scheme
The calibrated parameters from a limited number of basins (14) were used to represent the hydrological characteristics in each region. To increase the amount of calibrated parameters set for each basin, an strategy of considering the default (uncalibrated) SWAT parameters sets was chosen. For example, in the case of basins with streamgages not located at the point of exit of the basin, the resulting calibrated SWAT model would have two parameters set: parameters set from the subbasins upstream of the streamgage and the default parameters set from subbasins downstream of the streamgage.
The subbasins belonging to the same region were considered similar. In this context, unlike the approach presented in [86], the calibrated parameters set located in each region were grouped to obtain the regional parameters through the median. From them, the evaluation was performed by comparing the simulated and observed daily flows for the period 1981-2016 in the 14 studied basins.

Uncertainty Analysis
Under the assumption of equifinality, two basins belonging to the same region could have parameters set that are not correlated, which could clearly be problematic for regionalization studies. In this case, the model parameters are only correlated with basin attributes, which undermines the basic hypothesis of methods based on similarity [23].
In this study, the uncertainty associated with the parameters set selection process for regional hydrological modeling was analyzed. A stochastic process was used to quantify uncertainty in simulated daily flow series based on the findings of [23], in which the parameters set available for each region were resampled by bootstrapping [87]. The resampling was repeated 500 times, and then confidence intervals of the bootstrapping distribution were taken at the 95th percentile (lower limit: 0.025 quantile and upper limit: 0.975 quantile) to define the uncertainty bands (UB). The quantification of the uncertainty degree was evaluated based on the coverage ratio (CR) and average width index (AW I), which represent the amount of observed data that are contained in the UB and its width respectively [88][89][90].
The CR quantifies the statistical reliability of the derived UB to contain the value of the observed flow (Q O,T ) (Equation (3)).
where C is equal to 1 in the time step if the observed flow rate (Q O,T ) is greater than the lower limit of UB (Q U,T ) and less than the upper limit of UB (Q U,T ); and 0 otherwise. The resulting series (C) is added and divided by the number of time steps (T) (that is, the total number of observed flow to calculate the CR). The CR provides a proportional measure that goes from 0 to 1, where 1 represents a perfect value (the uncertainty intervals contain all the values of the observed flow) and 0 indicates that the uncertainty intervals do not have reliability to contain these values.
The AW I is a measure of the sharpness of the UB and quantifies the capacity of the UB to capture the system natural variability and the expected simulations of the model. The AW I is a measure of overlap between the width of the 95th percentiles of the observed flow duration curve (called AW clim ) and the average width (AW) of the uncertainty intervals, which is calculated as the average of the absolute difference between the upper and lower UB in all time steps. This is calculated as shown in Equation (4): where AW is the average width index of the UB time series, and AW clim is the absolute difference (width) of the quantiles 0.025 and 0.975 of the observed flow duration curve. An AW I value greater than 0 indicates that the derived UB reduces the uncertainty of the model output compared to the climatology natural variability, represented by the 95th percentile of the observed flow duration curve.

Defining Similarity by Clustering
Prior to clustering, PCA was used to reduce the dimensionality of the 11 variables shown in Table 2. Table 3 shows that the first 5 components explain approximately 90% of the accumulated variance, and their eigenvalues are greater than 0.8. The mean annual rainfall (Rm), mean annual flow (Qm) and runoff coefficient (Rc) show a high correlation with the first component. Latitude (lat), longitude (long), mean basin altitude (Zb) and mean annual potential evapotranspiration (Em) are better correlated with the second component, while basin slope (Sb) and fraction of forest cover (fc) are better correlated with the third component. The basin area (Ab) and drainage density (dc) variables have a high correlation with the fourth and fifth components; however, the latter has a little contribution to the total variance (less than 10% each) and have no greater implication in the posterior clustering.
As shown in Figure 3, there is a break (elbow) that starts from the fourth principal component, which implies 4 clusters for this case. This number is adjusted from Ward's hierarchical clustering by comparing other situations (from 4 to 7 clusters) and validated by maximizing the Dunn and Silhouette indexes. Finally, the 730 subbasins delimited in the Pd were grouped into five clusters. These clusters or regions characterize the particular behavior existing in the northern zone (regions 1 and 2) of the Pd [7] and present at least one streamgage in each of them.

Regional Runoff Model Performance
Five parameters of SWAT model were calibrated for 14 basins with daily flow data using the procedure described in Section 3.4. The optimal parameters for each model were obtained by evaluating the model performance in a calibration period (see Table 1) and validated in two period (as indicated in Section 3.4). Among them, the SURLAG parameter allowed to control the correct quantification of high flows as much as possible, while the parameters of GWQMN and RCHRG_DP were the most important to quantify correctly the flows from aquifers and to improve the simulated flows in dry periods. The SOL_AWC and SOL_BD parameters were important to control the flow contribution from the subsoil. First, we evaluate the performance from the calibrated parameters. As can be seen in Table 4, the calibration and validation of the model gives good results based on KGE, however the model performance decrease based on the logNSE in some basins. For the whole period (1981-2016), the results show that the performance in terms of the KGE is very satisfactory (values between 0.57 and 0.84) and is best evident in the central and northern basins of the Pd. In terms of logNSE, low flows were estimated with lower performance (values between 0.1 and 0.8), especially in the Santa (0.16) and Tambo (0.1) basins. In general, the calibrated parameters allowed to correctly estimate the daily flows rates compared to the initial estimates (without calibration). These improvements became more evident in the low flows quantification than the simulated high flows (as discussed below in Section 4.4).
As described in Section 3.5.2, the calibrated parameters of each SWAT model were transferred to the entire Pd. Then, the regional parameters (see Table 5) were applied to estimate regional daily flows in the Pd. Figure 4 shows the performances comparison between the simulated flows from calibrated and regional parameters in the period 1981-2016. It is observed that the performance decreases when estimating the regional flow in terms of the KGE, but they are still in a satisfactory range. The regional logNSE maintains the same overall performance with respect to the calibrated logNSE. In some cases, statistics tend to improve when applying the estimated regional parameters with respect to the previously calibrated parameters. This demonstrates the principle of equifinality of non-parsimonious models such as SWAT, where more than one parameters set can equally or better satisfy the objective function. Table 4 shows that the performance to estimate regional flows at daily step is good (average KGE of 0.58), with some exceptions in the Santa and Pisco basins (KGE 0.42 and 0.38, respectively). In addition, acceptable results were obtained in the low flows simulations (average logNSE of 0.48) without considering poor performance in the Ocoña basin (from calibrated logNSE: 0.31 to regional logNSE: −2).
How it is seen in Figure 5, low flows are systematically underestimated during dry periods (as seen in the Chancay Huaral basin). Furthermore, the maximum daily flows are underestimated in most cases (as can be seen in the scatter plots), probably due to the loss of their representativeness when regionalizing the SURLAG parameter.

Quantification of Water Resources and Water Balance
The results presented in Figure 6 are based on the simulated time series obtained in the 730 delimited subbasins. In the Figure 6a it can seen the 5 regions identified by cluster analysis. Then, if a basin water balance is considered as follows: the flow contribution (Q) added to the real evapotranspiration rate (ET) is equal to the amount of precipitated rainfall (PP); Figure 6b shows that subbasins belonging to lower latitudes (northern zone of the Pd) have a higher water production rate compared to subbasins belonging to higher latitudes (southern zone of the Pd). In addition, it is observed that losses produced by ET have an inverse pattern. The five estimated regions capture the particular characteristics of the Pd hydrological regime. Figure 6c shows that there is a marked seasonality of the Pd hydrological components for each region. It is observed that in region 1, the contribution of PP is greater and consequently has greater water production. For the other regions, the PP contribution is similar in quantity. Some characteristics of the study area are observed in all regions, in which ET rates are estimated at equal or greater magnitudes than Q rates. This particularity, added to the fact that the potential evapotranspiration (ETP) is much higher than the PP, is an indicator that Pd is an area predominantly arid (PP/ETP < 1; [48]). Figure 7 shows the results of regional flows expressed in terms of average annual specific runoff and annual runoff time series over 52 basins under natural conditions for the period 1981-2016. These results are a new contribution that complements and updates what was presented by [8]. The water availability of the Pd is quantified at the exit points near the Pacific Ocean and indicates the runoff values expressed as water yields. The maximum value of 17 l/s/km 2 corresponds to the Tumbes basin (basin furthest north in Figure 7a), while very low values (basins in red) are observed in the southern region (e.g., Caplina basin: 0.015 l/s/km 2 ). Figure 7b groups the northern basins (Region 1-2), which represent the largest water contribution. Figure 7c,d (Regions 3 and 4, respectively) group the central basins, which are characterized by pluvio-nival and pluvio-glacial regimes (e.g., Santa basin). This characteristic regime of the Pd central basins is moderately reduced until reaching Figure 7e (Region 5), which groups basins that present the most arid conditions. The average annual runoff (black dotted line) in each region also follows a regional hydroclimatic pattern with extraordinary events during the ENSO years. This is observed for extreme El Niño events in the years 1982/1983 and 1997/1998 in the northern basins. In contrast, low values are present in the southern basins during the same events.
Considering the average annual flows of the 52 basins shown in Figure 7, the total flow that drains to the Pacific Ocean in the period 1981-2016 is 990 m 3 /s and 872 m 3 /s without considering the ENSO extreme events of 1982/1983 and 1997/1998 (Figure 7f). These results are consistent with those presented by [8], in which total flow is estimated at 709 m 3 /s from the contribution of 49 basins, and with past studies in which only the observed flow data were considered (as listed in [3] Water resources are often quantified in terms of blue water flow, which is the water yield plus the deep recharge of the aquifer. Based on [92], the storage of green water (soil moisture) has been widely recognized as a crucial component of water resources. Using the calibrated SWAT model from the regional parameters, the long-term average blue water resources (1981-2016), green water storage and green water flow (real evapotranspiration) are shown for the 730 subbasins delimited in the Pd (Figure 8). The results show a well-defined spatial variation that is mainly a function of altitude and related to rainfall patterns. Blue water resources, green water storage and green water flow are greater in the middle and upper parts of the basins. The blue water flow (green water storage) has a greater presence in the north (south) zone, while the green water flow has a uniform distribution throughout the Pd. One way to take advantage of this information is by identifying areas with lower blue water resources and higher soil moisture because they are areas that have greater potential for the rainfed agriculture development, an important production activity in Peru. The runoff-rainfall ratio map (a derivative blue water map) serves as an indicator of erosion potential, which is predominantly higher in the northern zone, usually due to the high rainfall rates. It should be noted that the estimated storage and green water flow were not directly calibrated since there were no direct observations; however, they are presented here as a first quantification of these resources for the Pd.

Performance of the Uncertainty Bands at Streamgages
As explained in Section 3.6, the uncertainty bands (UB) evaluation was carried out based on the coverage ratio (CR) and the Average width index (AW I). The CR ranges from approximately 0.19 to 0.81, with mean and median values of 0.57 and 0.63 respectively. This CR is lower than those obtained in previous studies [88][89][90], which is explained given the different methodologies applied when estimating UB. The AW I analysis indicates that the UB are in a scenario in which they reduce the uncertainty of the simulated flows in comparison with the climatology natural variability (average of  Figure 9 shows that the KGE and CR values show a similar variability pattern (medians of 0.60 and 0.63 respectively). In addition, the CR values tend to show a positive correlation with the hydrological modeling performance (KGE). In some cases, a good KGE performance can be related to different CR values and viceversa. In the case of AW I, independent of the KGE values, is showed values between 0.7 and 0.9. There is not much variability in this index, so it is inferred that the AW I would always remain in that range if more streamgages were included in the analysis. Considering all the UB and observed daily flow of the 14 streamgages, Figure 10 shows that the CR values vary from month to month. The UB contains a higher (lower) proportion of the observed flow in dry (wet) periods during May-November (December-April). It is not the objective of the present study, but this analysis establishes that the estimated regional parameters for SWAT model do not allow to quantify correctly the maximum flows in the considered streamgages along the Pd. The UB for December-April contain, on average, 50% of the observed flows.

Subjectivity in the Physical Similarity Approach by Clustering
This article presents a methodology for flow rates estimation through hydrological modeling following a parameter regionalization approach. Although the SWAT model was used, the selected regionalization method is independent of the chosen hydrological model. The regionalization technique is based on the similarity approach, a method that transfers the optimal parameters set from calibrated hydrographic basins to similar hydrographic basins. In addition, this method is recommended over regression and spatial proximity methods [12,16,18,23] and has shown good performance in hydrological modeling in arid zones [18,93]. In our work, similarity is defined by subbasins grouping with physical and hydroclimatic characteristics that try to explain the hydrological behavior. Although the PCA prior to the definition of clusters or regions eliminates much of subjectivity when selecting the most influential similarity variables, it should be mentioned that the basin characteristics most commonly available (topography, land use, soil and climate) are not sufficient to fully explain the hydrological response [17,94,95], and it is suggested to incorporate additional basin characteristics [11]. It is likely that this includes hydrogeological characteristics and/or basin flow regulation characteristics [30], data of difficult availability in the Pd.

Physical Implications of the Regional Parameter Set
The results of this study indicate that the five regionalized parameters can be used to produce satisfactory simulations from SWAT model in the Pd. The values shown in Table 5 are mainly related to the climatic conditions of the study area. In the case of the RCHRG_DP parameter, the high values obtained (close to 0.9) indicate that the Pd is an area where rivers are predominantly recharged by aquifers and have high potential for water harvesting [96]. Previous studies have observed that the SWAT model underestimates low flow rates in dry and wet periods [97,98], and consequently, the model has been modified and even coupled to other models (e.g., MODFLOW). to improve the flow contributions from aquifers [99][100][101][102]. It is not the objective of the present study to modify the SWAT source code, but we emphasize the importance of calibrating the RCHRG_DP parameter to improve the low flows representativeness, especially in Pd basins. In addition, this parameter sensitivity is evident when it is regionalized; for example, in the Ocoña river basin, the performance of the logNSE decreases considerably (see Table 4) because the value of RCHRG_DP is reduced, which causes the model to underestimate dry periods flow. Set a suitable threshold (GWQMN) for there to be a contribution from the aquifer is also very important. For the Pd, these values are between 500 and 800 mm and allowed to correct the initial low flows underestimations. Although in SWAT the groundwater routine is very simplistic, these first results suggest the need for detailed groundwater studies in the Pd and to have a better physical interpretability of these parameters.
The parameter values of SOL_AWC and SOL_BD indicate that the Pd is characterized by an arid climate and a predominantly sandy soil ( Table 1). The percentage increase of SOL_BD is related to lighter soils that have a lower water holding capacity than heavier or clayey soils. On the other hand, the high rates of potential evapotranspiration ( Figure 6) in arid climates generate high water losses to the atmosphere. The parameter SOL_AWC allows to regulate the amount of available water in the soil for the evapotranspiration process, and a decrease in its value corrects the initial underestimations of SWAT model and thus improves the initial performance of the KGE (Table 4). Finally, the SURLAG parameter is important to correctly simulate the high flows according to its physical definition (see Table 5). From the initial SURLAG calibrated parameters, the simulated high flows generally underestimated the observed ones in most of the studied basins; however, this was a fairly acceptable underestimation. Although the PISCO rainfall product does not correctly capture the intensity of the rainfall [57], one of the main factors of the low representativeness of high flows could be attributed when the flow is obtained from the regional SURLAG parameters (as shown in Figure 5). In the present work, greater importance was given to the water balance representation, and we indicate that for a better high flow estimation, it is recommended to condition the SWAT calibration based on indices that evaluate only the extreme flows behavior.

Conditionality of the SWAT Calibrated Model and Its Applicability
It should be noted from the beginning that hydrological models calibration is subjective and that no automatic calibration algorithm can replace the analyst knowledge in relation to the basin hydrology and calibration problems. Therefore, the calibration and uncertainty analysis are closely linked, and calibration results should not be presented without an uncertainty degree quantification in the model predictions [24]. In this context, the uncertainty analyzed in this study serves as a first approximation to evaluate the calibrated parameters set performance obtained by each region. While we use bootstrapping to increase the number of available parameters per region and to present a more consistent analysis of the UB, we can say that the uncertainty associated with regionalization process has yielded good results mainly due to the criterions used in the calibration process to reduce the equifinality of SWAT model. It is understood that more streamgages should be taken into account for a better analysis; however, indicating the processes that the model is not adequately representing is a "high value information" for decision makers. Based on the above, the water resources quantification in the Pd presented here should be taken into account according to the objectives for which the SWAT model was built. Having a regional model that estimates natural flow series and correctly represents the Pd basins water balance is useful for identifying long-term relationships with climate variability and climate change impacts, and to be applied for water resource management purposes.

Summary and Conclusions
This study proposes a methodology for daily flows estimation in the Pacific drainage of Peru from a physically based model (SWAT). In most large-scale hydrological models, the calibration process of model parameters is critical and even more so for areas with limited availability of hydrological data. A parameter transfer method was proposed based on an approach of physical similarity and clustering that divided the Pacific drainage (divided in 730 subbasins) into five regions assumed to be hydrologically homogeneous. Fourteen basins were calibrated to subsequently estimate the regional parameters based on the median of the calibrated parameters set available in each region. As a first approximation to evaluate the methodology robustness in the parameter regionalization process, by means of statistical techniques such as bootstrapping, the uncertainty bands were derived in each evaluated basin. The results show that the SWAT model with regional parameters can simulate the observed flows well and correctly represent the seasonality of the hydrological cycle main components. The high rates of potential evapotranspiration on precipitation and the particular flow behavior in the northern zone of the Pacific drainage are equally well represented.
At the annual level, flow series pattern variability along the drainage and the total flow contribution to the Pacific Ocean show similarity with previous studies. Certain relationships were found between values of the calibrated parameters and physical-climatic characteristics of the basins studied; however, a greater analysis was not possible due to the absence of observed data to support our assumptions. Similarly, the water resources quantification based on components of blue water, green water storage (soil moisture) and green water flow (real evapotranspiration) allowed us to obtain a better understanding of water availability and variability in the Pacific basins, which should serve as a tool for better use and management of its water resources. The evaluation of derived uncertainty bands indicated that the high flows are not correctly quantified; however, the natural variability of observed flow climatology is well represented. These results support the idea that flows are better simulated from a calibrated model than from a regionalized model. In addition, the idea is supported that the best way to handle the problem of rainfall-runoff modeling in uncalibrated basins would be to install a streamgage.
The present study adds to the research efforts in the hydrological modeling field at the regional scale that have been carried out in recent years on the Peruvian Pacific drainage. A first step is presented to expand the use and development of physical bases hydrological models for their application at a regional scale, and the presented parameters transfer approach is promising to estimate SWAT parameters in areas with scarce hydrological data in Peru. Future studies will be dedicated to investigating the flow rates sensitivity in a context of climate change and exploring calibration techniques in hydrological modeling to correctly estimate annual maximum daily flows.