PISCO_HyM_GR2M: A Model of Monthly Water Balance in Peru (1981–2020)

: Quantiﬁcation of the surface water offer is crucial for its management. In Peru, the low spatial density of hydrometric stations makes this task challenging. This work aims to evaluate the hydrological performance of a monthly water balance model in Peru using precipitation and evapotranspiration data from the high-resolution meteorological PISCO dataset, which has been developed by the National Service of Meteorology and Hydrology of Peru (SENAMHI). A regionalization approach based on Fourier Amplitude Sensitivity Testing (FAST) of the rainfall-runoff (RR) and runoff variability (RV) indices deﬁned 14 calibration regions nationwide. Next, the GR2M model was used at a semi-distributed scale in 3594 sub-basins and river streams to simulate monthly discharges from January 1981 to March 2020. Model performance was evaluated using the Kling–Gupta efﬁciency (KGE), square root transferred Nash–Sutcliffe efﬁciency (NSE sqrt ), and water balance error (WBE) metrics. The results show a very well representation of monthly discharges for a large portion of Peruvian sub-basins (KGE ≥ 0.75, NSE sqrt ≥ 0.65, and − 0.29 < WBE < 0.23). Finally, this study introduces a product of continuous monthly discharge rates in Peru, named PISCO_HyM_GR2M, to understand surface water balance in data-scarce sub-basins.


Introduction
In Peru, surface water resources are distributed heterogeneously throughout its three hydrographic regions: Pacific (western Andean slopes and Peruvian coast), Titicaca (endorheic part of the Peruvian altiplano), and Atlantic (Amazon basin). The densely populated Pacific slope is characterized by high water stress due to its low water supply and high demand by its economic activity. In contrast, the sparsely populated Atlantic slope has a large surplus due to low demand and, above all, because it is supplied by the Amazon basin [1]. In this context, adequately quantifying water supply is critical for properly managing and planning water resources in the country [2][3][4]. However, the low density of stations and their short-term data records make it difficult to monitor and forecast streamflows at a national level, so hydrological modeling emerges as a promising option for complementing the hydrometric records, improving the understanding of the rainfall-runoff relationship [5,6] and for seasonal hydrological forecasting [7,8]. Implementing a large-scale hydrological model requires precipitation data with wide spatiotemporal coverage, so the use of satellite precipitation products has become increasingly important in recent years [9], especially for their application in Peruvian basins with scarce information [10]. For example, the recent meteorological gridded product development for the Peruvian domain called PISCO (Peruvian Interpolated data of SENAMHI's Climatological Observations) [11] might drive a large-scale hydrological model to estimate monthly discharges at a national level in a data-scarce context. Regional hydrological models, unlike local models, requires more effort because of the 'problem of regional modeling' that arises as described by [12] because the model must be calibrated and validated simultaneously with a large number of hydrometric stations and with multiple basins of different rainfall and temperature regimes, physiography, and vegetation cover [13], all without altering the regional representation of hydrological processes [14]. This characteristic is problematic for quantifying surface runoff in datascarce basins, leading to alternative ways to extrapolate hydrological information from one basin to another [15], commonly following criteria of proximity [16] and hydrological similarity based on physiographic and hydroclimatic characteristics [17]. Recently, hydrological regionalization techniques have been explored based on empirical relationships between geomorphological characteristics and model parameters [18], analysis of hydrological dissimilarity [19], streamflow fluctuation [20] and sensitivity [21], machine learning techniques [12], partial least squares regression and clustering analysis [15,22], principal components, and self-organization mapping [23].
Currently, there are physically based hydrological models that represent in detail all the hydrological processes in a basin [24,25]; however, their application in a data-scarce context would increase uncertainties in different components of the hydrological system [26][27][28]. In contrast, conceptual hydrological models require fewer data, making them operationally easier and decreasing the computational cost in an extensive domain [29,30]. For instance, the GR2M conceptual model [31] has been widely used in different hydroclimatic conditions around the world with satisfactory results [32,33] and even performing better than other water balance models [34]. Additionally, it is used to assess climate change effects on water resources [35][36][37]. In recent years, experiments have been developed to improve hydrological modeling performance with GR2M, incorporating Bayesian calibration approaches [38,39] and coupling to fuzzy models [7]. No prior research has been reported at a national level in this region using the GR2M model. A regional hydrological simulation that incorporates observed data and provides a dataset of estimated discharges at river streams in three Peruvian slopes is still challenging. However, in the Amazon basin, the GR2M model has been applied to assess water resources' climate change and identify annual discharge trends [40]. A recent implementation in the Pacific slope also evaluates multidecadal runoff in a data scarcity condition, showing higher model robustness than other conceptual models [18]. In ungauged basins, regional GR2M parameter estimation has been studied in [23] using a regression approach finding unsuitable model results for basins located under a semi-arid climatic regime. In [41], the GR2M model is applied to reconstructed monthly river streamflows in 51 gauged sub-basins.
The generation of global and regional hydrological datasets such as the products Model Parameter Estimation Experiment data (MOPEX) [42] and the Catchment Attributes and MEteorology for Large-sample Studies (CAMELS) [43], among others, are beneficial for exploring the behavior of basins [44], anticipating hydrological changes [45] and studying the impact of human activities on the hydrological cycle [46]. In South America, local adaptations have been integrated into the CAMELS product, such as the CAMELS-BR [47] datasets in Brazil and CAMELS-CL [48] in Chile currently used to study the impact of climate change on water resources and the study of droughts, among others. In this sense, hydrological modeling at the national level is particularly useful to establish a basis for constructing a hydrological dataset in Peru.
This study aims to evaluate the hydrological performance of a monthly water balance model at a national level. For this purpose, the sensitivity analysis of two hydroclimatic indices is used to define calibration regions, and the GR2M conceptual model is used to simulate monthly discharges in gauged and ungauged sub-basins from January 1981 to March 2020. Finally, a new hydrological product in Peru is introduced to provide continuous monthly streamflow information over the country and contribute to understanding the water balance in data-scarce basins.

Study Area
Peru is located on the west coast of the South American continent. It has an area of 1,285,220 km 2 and a population of approximately 32.5 million people. It borders on the west with the Pacific Ocean, on the north with Ecuador and Colombia, and on the southeast with Brazil, Bolivia, and Chile. The Andes mountain range creates a complex topography and introduces hydroclimatic variability along its three hydrographic regions: Pacific, Atlantic, and Titicaca. This natural orographic barrier traps atmospheric moisture from the Atlantic, producing high rainfall over the Andean-Amazon region and Amazon lowlands (eastern side) and low rainfall on the coast (western side) [40], leading to the great contrast of water resources in the country, characterized by a much larger water supply on the Atlantic slope than on the Titicaca and Pacific slopes [1]. Rainfall is highly variable in both space and time [49]. Maximal rainfall rates occur between November and March. Arid conditions with low rainfall rates characterize coastal areas in the Pacific slope (<~150 mm/year) and semi-arid conditions (<~400 mm/year) in the western flank of the Andes [18]. The Atlantic and Titicaca slopes have humid conditions with high rainfall rates in the eastern flank of the Andes (~1100 mm/year), the Andes-Amazon transition (~3200 mm/year), and the Amazon lowland (~2550 mm/year) [11]. Mean annual temperature fluctuations over the country appear indirectly related to elevation (lower altitude, more temperature).
In this work, the study domain corresponds to the entire Peruvian territory, including transboundary basins with Ecuador, Colombia, and Brazil. It has an approximate total drainage area of 1,480,620 km 2 ( Figure 1). Moreover, 3594 river streams and sub-basins with a median area of 300 km 2 (with extremes values of 40 km 2 and 2500 km 2 ) were delimited to obtain fine streamflow spatialization according to meteorological inputs resolution (~10 ×~10 km) and considering a unique river stream by sub-basin to compute flow accumulation. In Figure 1, gauged areas correspond to drainage areas covered by a hydrometric station, while ungauged areas correspond to areas without any hydrometric control downstream.

Data and Methods
The methodology used in this study is outlined in Figure 2. The details of the data and methods used are shown below.  Currently, the PISCO product is only a dataset with a daily and monthly temporal resolution for the variables of precipitation (P), air temperature (TMP), and potential evapotranspiration (PET). It has a spatial coverage throughout the Peruvian territory, including transboundary basins with Ecuador, Colombia, and Brazil. In its stable version, PISCO is only available from 1981 to 2016. However, an unstable version of the precipitation sub-product used for operative purposes is available from 1981 to the present day, and it is updating daily. The gridded precipitation sub-product (0.1 • × 0.1 • ) is generated by applying geostatistical techniques to combine satellite estimates of precipitation data from the CHIRP project (Climate Hazards InfraRed Precipitation) and ground data of the SENAMHI pluviometric network [11]. Similarly, the temperature sub-product (0.1 • × 0.1 • ) is generated by combining air temperature data from MODIS images and observations from weather stations. The evapotranspiration sub-product (0.1 • × 0.1 • ) is generated from the previous temperature-gridded data following the methodology proposed by [49]. The PISCO dataset is available on the IRI Data Library: http://iridl.ldeo.columbia.edu/ SOURCES/.SENAMHI/.HSR/.PISCO (accessed on 20 March 2019).
In this work, the mean areal values of P and PET are calculated for each of the 3594 subbasins from January 1981 to March 2020 and then are continuously updating for operational purposes of the continuous monthly streamflows estimation. Due to our hydrological model's operational purpose, the precipitation sub-products unstable version was used in this study. In contrast, in evapotranspiration, we only use climatological values due to the lack of data since January 2017.

Discharge Data
In this work, monthly observed streamflows at 43 hydrometric stations were selected from January 1981 to March 2020. Most of these stations belong to the National Service of Meteorology and Hydrology of Peru (SENAMHI, https://www.gob.pe/senamhi, accessed on 13 March 2021). Specifically, in the Amazon region (Atlantic slope), most of the stations are monitored by the SENAMHI and the IRD (French Institute for Sustainable Development) in the frame of the HYBAM Project (https://hybam.obs-mip.fr/, accessed on 25 March 2021). The detail of selected stations is summarized in Table 1, and their distribution throughout the Peruvian territory is shown in Figure 1a. The selection processes considered stations with more than 10% coverage in the study period (1981-2020), stations with high quality of data, and stations located in basins that guarantee coverage throughout the national territory. Finally, a total of 74.8% of the study area is gauged by the 43 hydrometric stations selected, and only 25.2% correspond to ungauged areas.

Semi-Distributed GR2M Model
In this study, the GR2M conceptual model [50] simulates monthly runoff in 3594 subbasins. The GR2M model transforms rainfall into runoff by two equations: production and transfer functions [51], and requires monthly input data of precipitation (P) and potential evapotranspiration (PET). P is distributed in the upper storage tank (S) with limited capacity and the underground storage reservoir (R). Besides, GR2M has two calibratable parameters (X1 and X2), where X1 defines the maximum capacity of S and X2 defines the exchange between R water outside the basin. Because GR2M is a lumped model, the simulated runoff in each sub-basin is then routed (as FlowAcum) considering the flow direction (FlowDir) generated from a 90 m HydroSHEDS Hydrologically Conditioned Digital Elevation Model (CONDEM) [52] and a Weighted Flow Accumulation (WFAC) algorithm where the simulated runoff gives the weighting factor (Weight) in each sub-basin. For example, at time step t = 1, the monthly runoff (q) is first calculated for each sub-basin in the study area. Then the Weight is created by rasterizing the sub-basins based on the flow direction raster and associating the value of q to the pixel corresponding to the centroid of each sub-basin, as appropriate. Finally, the WFAC is used to accumulate the values of q and obtain a raster map with the values of discharges (Q) for each river stream. This process is repeated for each time step. The scheme of the semi-distributed GR2M adaptation for a national level modeling is shown in Figure 3. In this work, the GR2M model incorporated in the airGR R package [52] was used together with adaptations to calculate P and PET's areal means from gridded data, run the semi-distributed GR2M model using the WFAC algorithm, and automatically calibrate the model parameters. As a result of this process, an experimental R package called GR2MSemiDistr was generated and is free available on https://github.com/hllauca/GR2 MSemiDistr (accessed on 13 March 2021).

Sensitivity Analysis
Similar to [53], the spatial patterns and the magnitude of the relative sensitivities of X1 and X2 concerning two hydroclimatic indices are considered the basis for delimiting the homogeneous calibration regions at a national level. Unlike studies that perform a sensitivity analysis (SA) using metrics such as the Nash-Sutcliffe efficiency index [54], this study examines the sensitivity based on the runoff rate (RR) and the runoff variability (RV) hydroclimatic indices proposed by [55]. Table 2 describes both indices and their respective equations. Fourier amplitude sensitivity testing (FAST) [56] was applied using the fast R package of [55] to calculate the relative sensitivities of X1 and X2 for both indices in all sub-basins. An ensemble of 1000 unique and uncorrelated parameters was generated. The GR2M model was then run at a national level for each ensemble member using the same P and PET forcing data, yielding thousands of streamflows time series and RR and RV indices. Finally, a Fourier transformation is applied to RR and RV, and results are scaled from 0 to 1 to obtain the X1 and X2 relative sensitivities. Zero values indicate a low sensitivity of the hydroclimatic index to a GR2M parameter variation, and 1 indicates a high sensitivity. The details of the FAST methodology are found in [57]. Table 2. Hydroclimatic indices used for the sensitivity analysis (SA).

Index Unit Equation Description
Runoff Ratio (RR) - The ratio of simulated runoff to precipitation Runoff Variability (RV) - The standard deviation of simulated runoff to the standard deviation of precipitation Note: X, simulated runoff in mm; P, rainfall in mm; σ X,P, standard deviation of simulated runoff and rainfall.

Calibration Regions and Sub-Regions
RR and RV's relative sensitivities calculated in the previous section and proximity variables (latitude and longitude) concerning each sub-basin's centroid were used for cluster analysis. Then, Ward's hierarchical clustering method based on L-moment statistics [58] was used to divide the sub-basins into homogeneous regions. Finally, the regions are hydrologically conditioned according to the discordancy and heterogeneity statistics done in [59] and including at least one hydrometric station in the calibration region. In the absence of a station within the previously defined region, neighboring regions are merged to ensure this condition.
Due to the low density of hydrometric stations in the study area providing a long record of monthly streamflows for the model calibration ( Figure 1a and Table 2) and the effect of equifinality of the parameters [60], sub-regions are restricted to the portion-gauged by hydrometric stations within each calibration region. These sub-regions are defined by superimposing the calibration regions and gauged areas' boundaries ( Figure 1a). Thus, there will be as many sets of calibrated GR2M parameters for a given region as there are sub-regions within it. In the case of ungauged areas, only the calibration regions are considered.

GR2M Calibration and Validation Strategy
In this work, 43 hydrometric stations are used to calibrate and validate Peru's water balance model. The selected calibration period for the hydrometric stations ranging from 60 to 70% of their available records. The P and PET climatologies (1981-2010) were used to fill data between January 1978 and December 1980 and consider them a warm-up period for national simulation. Parameters X1 and X2 were automatically calibrated using the Shuffled Complex Evolutionary (SCEA-UA) algorithm [61], considering the Kling-Gupta efficiency criterion (KGE) [61] as an objective function similar to [62] with an emphasis on high flows. The square root transferred Nash-Sutcliffe efficiency (NSE sqrt ) [62] is used to evaluate model performance in general flows, and the Water Balance Error (WBE) [63] was used to assess the model bias. The summary of the statistical metrics used and their corresponding equations are shown in Table 3. The validation process consisted of evaluating the model's outputs based on the previously calibrated parameters using the remaining 30-40% of the available streamflow records. Table 3. Statistical metrics and their corresponding equations used for evaluating the hydrological performance of GR2M model.

Statistical Metric Unit Equation Optimal Value
Kling-Gupta efficiency (KGE) - The model calibration and validation were performed in gauged areas, and a stepwise calibration strategy was used in this study (sub-region approach, Figure 4). In the first step, the parameters of headwater sub-basins are calibrated and are used downstream, while in the last step, only the parameters of the remaining sub-basins are calibrated. The sub-basins of the Pacific and the Titicaca slopes were calibrated in a unique step, while for the Atlantic slope, seven steps were required. Finally, after model calibration and validation for each sub-region, X1 and X2 values are grouped by calibration region at a national level.

Discharge Simulation at a National Level
The median of X1 and X2 is calculated for each calibration region to simulate monthly discharges in ungauged areas (regional approach). To assess the model performance when using the median of X1 and X2 as a representative parameter for each calibration region, the GR2M model was again run-in gauged areas, and new statistical metrics were calculated. Moreover, changes in model performance before and after applying the median of parameters by calibration regions were evaluated. Finally, a new model run including gauged and ungauged areas was executed to simulate monthly discharges in 3594 river streams.

Sensitivity Analysis and Calibration Regions
The relative sensitivities derived from the FAST analysis using the RR and RV indices for each of the 3594 sub-basins throughout the study area are shown in Figure 5a. Because the GR2M model has only two parameters, the patterns of relative sensitivities to X1 and X2 are inverse. This study assesses the hydrological response in terms of the magnitude (RR) and variability (RV) of the rainfall-runoff relationship. RR and RV indices are more sensitive to X2 over much of the study area. Especially on the Pacific slope, RR has high sensitivities to X2 in coastal sub-basins but declines slightly towards the Andes' western flank. On the Titicaca and Atlantic slopes, moderate to high sensitivities to X2 is observed, except for the central part of the Amazon extending southeast and to part of the North Pacific, where there are high sensitivities to X1. In RV, spatial patterns are like RR but with greater sensitivities relative to X2 in the major part of the Atlantic and the Titicaca slopes, while in the Pacific slope, sensitivities to X1 increase in the west flank of the Andes and the North Pacific sub-basins. The clustering analysis results incorporating the relative sensitivity patterns of RR and RV at a national level are shown in Figure 5b. The lack of hydrometric information in some of the initial regions meant having to merge contiguous regions, so 14 regions were identified throughout the study area. In the coast of the Pacific, three regions were identified (F, L, and N); in the Andes-Amazon transition, five regions were obtained (C, G, I, J, and M), while six regions (A, B, D, E, H, and K) were obtained in the Amazon lowlands. It was ensured that there was at least one hydrometric station for each of the calibration regions (as is the case for regions B, E, and K), while in some cases, regions with more than five stations (M and C) were obtained (Table 4). Finally, overlapping the boundaries of gauged and ungauged sub-basins (Figure 4) with the 14 calibration regions (Figure 5b) generated 96 sub-regions (Table 4), which corresponds to 96 different sets of GR2M parameters. Of these, 84 parameter sets are estimated by model calibration (sub-region approach), while the remaining 12 were inferred based on the median values for each calibration region (regional approach). Figure 6 shows the spatial distribution of the three metrics selected to assess the monthly water balance model's performance at a national level during its calibration, validation, and entire period. In terms of KGE and NSE sqrt , the model performs well during the calibration period, with values above 0.75 and 0.65, respectively, at stations on the Pacific and the Andes-Amazon transition; however, low values of KGE and NSE sqrt (<0.50) predominate at stations on the Amazon lowlands. Performance remains the same during validation but with a slight decrease in the KGE and NSE sqrt metrics at stations belonging to the Andes-Amazon transition. Regarding the WBE, balance errors close to zero during the calibration period are observed at stations with high KGE and NSE sqrt values, except for positive balance errors not greater than 0.25 at stations in the Amazon lowlands. Negative balance errors increase in the validation period, up to −0.38 on the Pacific coast and the Andes-Amazon transition. Figure 6. (a-c) Statistical metrics for evaluating the GR2M model performance at a national level during the calibration, the validation, and the total period. In terms of the KGE and NSE sqrt metrics, cold colors represent good model performance while warm colors represent inadequate model performance. In WBE, blue colors are associated with the underestimation of the total volume of surface runoff and red colors indicate the overestimation.

Model Performance Assessment
When evaluating the total period, good model performance in terms of KGE (KGE ≥ 0.75) is maintained for 71% of the stations. In the same period, NSE sqrt values higher than 0.65 for 70% of the stations demonstrate a good representation of the subbasin's general flows. In terms of the WBE, negative values of not less than −0.20 are evident at most of the stations with high KGE and NSE sqrt values, and positive values of not more than 0.23 are evident for the Amazon lowlands. This behavior indicates that stations with a good fit in terms of KGE and NSE sqrt tend to slightly overestimate the total runoff, while on the Amazon lowlands, it tends to be underestimated. Figure 7 shows the observed and simulated monthly and annual hydrographs and their respective seasonal variation curves for two hydrometric stations on the Pacific slope (ETI and SOC), one on the Titicaca slope (HNE), and three on the Atlantic slope (EKN, TOC, and PUC), corresponding to stations with extensive streamflow records from January 1981 to March 2020 (Table 1). In all cases, the model manages to represent the seasonal and interannual variability of the observations in both small basins of 65 m 3 /s average annual flow (Cañete basin) up to basins with 9000 m 3 /s (Ucayali basin). The simulated series fit very well to the observations at most of the stations evaluated, except for SOC, where the wet season's streamflows (December-March) were slightly overestimated. At an annual scale, the model can represent very dry (e.g., 1991 and 1992) to very wet years (e.g., 1997) in sub-basins of the three Peruvian slopes. The seasonal variation curves adequately represent the peak flow month, except for the PUC station, where there is one lag month (Figure 7d). This adequate seasonal and interannual representation is repeated in the remaining hydrometric stations (not shown), except for those located in the Amazon lowlands, where the monthly model performs poorly in terms of NSE sqrt (Figure 6b). The variation of the calibrated GR2M model parameters in the gauged areas is shown in the boxplots of Figure 8a,b. In [31] reports that X1 could take values from 0.1 to 2000 mm while X2 could vary between 0 and 2. There are slight variations of X1 and X2 in calibration regions located in the south of the country (J, K, L, M, and N; see Figure 5b) and northcentral coast regions (F; see Figure 5b). Also, there are slight variations of X1 and high variation of X2 in calibration regions located in the northeast of the country (B, D, and E; see Figure 5b), predominantly in the Amazon lowlands. Additionally, we identified high variations of X1 values and low variations of X2 in regions of the north-central Andes (A, C, H, and I; see Figure 5b). Only in the calibration region G, high variations in both X1 and X2 values are present (Figure 8a,b). It is important to notice that regions with high X2 variations in Figure 8a,b coincide with stations of low performances in terms of KGE and NSE sqrt (Figure 6a,b). In contrast, the regions with smaller variations in both X1 and X2 parameter values correspond to stations with a good model fit. Figure 8c-e shows the model performance variation before (sub-region approach-Sub) and after (regional approach-Reg) applying the median of GR2M parameters by each calibration region. In terms of the KGE and NSE sqrt , the model performance using the regional approach (Figure 8c,d) declines mainly in sub-basins of the A-D regions (see Figure 5b) and is relatively stable (except for region M) in south-central regions. Since ungauged areas are located predominantly in regions F, L, and N, the regional approach of parameters in these sub-basins is suitable for estimating monthly discharges.

Product of Simulated Monthly Discharges at a National Level
The regionalization based on sensitivity analysis of GR2M parameters at a national level and using the meteorological (P and PET) PISCO dataset allows simulating continuous monthly discharges in 3594 rivers streams (including ungauged areas) from January 1981 to March 2020. This new product of simulated monthly discharges named PISCO_HyM_GR2M is available at https://doi.org/10.6084/m9.figshare.14382758 (accessed on 7 April 2021), and it is the new hydrological sub-product of the PISCO dataset. Additionally, it will contribute to the understanding of the water balance in data-scarce basins. For instance, the PISCO_HyM_GR2M product is currently used for drought monitoring in the National Service of Meteorology and Hydrology of Peru (available online: https://www.senamhi.gob.pe/?p=monitoreo-pronostico-sequias-accessed on 14 January 2021).
The qualitative classification of the PISCO_HyM_GR2M simulations, based on KGE [63] and NSE sqrt [64] performance categories for gauged areas are shown in Figure 9a,b, respectively. Both metrics agree that simulated monthly discharge in the central and southern of the study area are well represented, while those for the northeast (Amazon lowlands) should be interpreted with caution. The latter varies depending on the hydrograph assessment approach because the KGE metric emphasizes high flows [65], while NSE sqrt reduces this effect and emphasizes the general representation of streamflows [64].

Sensitivity Analysis and Calibration Regions
In this paper, two conceptual parameters' relative sensitivities are used as main predictors to define calibration regions at a national level similar to [21], instead of traditional climatic and physiographical characteristics [15]. Despite the differences between objective functions selected (RR and RV, Table 2), the spatial patterns of relative sensitivities for GR2M parameters are very similar in both cases (Figure 5a) due to the parsimonious model structure [31,34], finding that X2 is the most sensitive parameter for RR and RV indices in a great number of sub-basins due to its correction role in runoff generation [18] instead of X1 in charge of controlling soil moisture in the production storage. In terms of GR2M outputs, we found that a slight variation of X2 (controlling the routing storage) can significantly alter the rainfall-runoff transformation in many basins nationwide due to changes in the magnitude and variability of the simulated runoff.
The results showed main differences in RR and RV sensitivities spatial patterns in the Pacific Slope (see Figure 5a) due to X2-that controls water exchange and groundwater fluxes as mention in [65]-is more relevant for RR in coastal sub-basins where GR2M runoff is base flow-dominated [66] than in mountain ranges areas where X1 has more relative sensitivity for RV. However, in the behavior in the central-northern Amazonian region (Atlantic slope), abrupt changes in X1-X2 sensitivities might be more related to model inputs biases and structural uncertainties that propagate to model parameters and outputs.
As the regionalization approach is based on the sensitivity analysis of a parsimonious conceptual model (see Section 3.3), calibration regions delimited (Figure 5b) represent areas with a similar level of parameter uncertainties [65] and hydrological model response [65], instead of providing similarities of geomorphological and climatic such as present in [66]. Thus, calibration regions generated in this work are only valid for the GR2M water balance model using the PISCO product as meteorological inputs.
In addition to the relative sensitivities calculated in this work, the hydrometric network at the national level plays an important role in the final delimitation of the calibration regions and sub-regions (Figure 5b) because the existence of at least one of them is a determining factor. In this sense, unlike the studies by [67] where parameter uncertainty bounds are identified based on residuals analysis of hydrometric stations by region, the low density of stations (Figure 4), mainly on the north Atlantic slope, could be altering the natural grouping of sub-basins and thus reducing the predictive capacity of regional parameters of the GR2M model (see Section 4.2) based on a sensitivity analysis. Future studies will assess regional parameter uncertainty in uncontrolled areas and impact discharge estimation.

Model Simulations at a National Level
The unsatisfactory results in the northern Amazonian region (Figure 9) reflects two issues: first, the greater uncertainty of the spatial rainfall distribution in the Marañón [49], Ucayali and Huallaga [49,68] basins, and the PISCO P sub-product biases probably because of the lack of adequate rainfall estimates in equatorial regions [11]. This lower model performance is similar to that were obtained in [67,68] using different hydrological models in a daily timestep and different sets of satellite precipitation products. Thus, rainfall uncertainties propagate to model outputs and reduce the model's predictable capacity [26]. Additionally, PET climatology used in this paper for operational purposes might not be reflecting actual evapotranspiration in the Amazon plain. Future works will incorporate a robust assessment of evapotranspiration in the hydrological modeling with a data scares scenario and its impacts on the water balance.
Secondly, floodplain plays a key role in the flow routing, with a large amount of water stored during the flood [69]. For instance, in the Ucayali basin, the flood peak is delayed by two months between the LAG and REQ stations [70,71]. These behaviors might alter basins storage and delay months of peak flow in basins with larger drain areas such as the Amazon plain, and GR2M routing might not represent this characteristic such as PUC station in Figure 7d.
It is also important to consider that GR2M is a model with limitations due to the conceptualization of hydrological processes in two reservoirs (production and routing) in a lumped modeling approach [33]. Despite its outstanding performance throughout the national territory ( Figure 6), it may not be able to adequately represent runoff in basins with large drainage areas (>200,000 km 2 ) such as in the Amazon plain. Future works will incorporate routing models such as the Routing Application for Parallel computatIon of Discharge (RAPID) [72] to improve flow routing throughout the national drainage network, especially in the Atlantic slope.

Conclusions
This study evaluated a monthly water balance model's hydrological performance in 3594 sub-basins and river streams in Peru. Parameter calibration regions were defined based on the sensitivity analysis of two hydroclimatic indices. Finally, the monthly simulated streamflows product named PISCO_HyM_GR2M from January 1981 to March 2020 was developed. The main conclusions are summarized below: (a) The hydrological performance of the GR2M model in Peru performed well in subbasins of the Pacific slope and the Andes-Amazon transition (part of the Titicaca and the Atlantic slopes). The model adequately represents the seasonality and interannual variability of the streamflows, except for the Amazon lowlands, where only high flows are well-represented. (b) Through the monthly meteorological PISCO sub-products, it is possible to simulate the runoff volume over most of Peru adequately. However, the uncertainties associated with these sub-products are more significant towards the north of the country where there are not enough meteorological stations, so this error propagates towards the hydrological model outputs for the Amazon lowlands. (c) The proposed methodology to define the calibration regions based on the spatial patterns of two hydroclimatic indices' relative sensitivities proved to be an appropriate technique for calibrating and validating the GR2M model and estimating monthly discharge in ungauged sub-basins.
The results presented in this work also demonstrate the enormous potential of the PISCO_HyM_GR2M product for understanding the dynamics of surface water resources in Peru. Future versions of this product will include an extensive analysis of different routing methods and the uncertainty analysis of discharges.