Hydrologic Validation of MERGE Precipitation Products over Anthropogenic Watersheds

: Satellite rainfall estimates (SRFE) are a promising alternative for the lack of reliable, densely distributed, precipitation data common in developing countries and remote locations. SRFE may be signiﬁcantly improved when corrected based on rain gauge data. In the present study the ﬁrst complete validation of the Tropical Rainfall Measuring Mission (TRMM) 3B42-based MERGE product is performed by means of ground truthing and hydrological modeling-based applications. Four distinct, highly anthropogenic watersheds were selected in the Upper Para í ba do Sul River Basin (UPSRB)—Brazil. The results show that when compared to TRMM Multi-Satellite Precipitation Analysis (TMPA) 3B42V7 at the watershed scale, MERGE has a higher correlation with observed data. Likewise, root mean square errors and bias are signiﬁcantly lower for MERGE products. When hydrologically validated, MERGE-based streamﬂow simulations have shown the capacity of reproducing the overall hydrological regime with “good” to “very good” results for the downstream lowland sections. Limitations were observed in the hydrological modeling of the upstream, highly anthropogenic, dammed watersheds. However, such limitations may not be attributed to MERGE precipitation since they were also obtained for the individually calibrated rain gauge-based simulations. The results indicate that the used MERGE dataset as a hydrological model input is better suited for application in the UPSRB than the TMPA 3B42V7.


Introduction
Precipitation is constantly referred as the most important water input to the hydrological cycle [1][2][3] as it regulates the renewable water resources distribution and consequently affects many aspects of human, ecological, and economic development [4]. In addition, its intensity, duration, and frequency abnormalities often result in natural disasters (e.g., floods, droughts, and landslides) [5], thus making its estimation indispensable for a variety of hydrological, as well as agronomic, meteorological, and climatological applications [6,7]; including the mitigation of extreme hydrological events and design of hydraulics structures [8]. However, the scarcity and unreliability of precipitation measurements present a challenge to the proper water resources management and engineering [9]. the latter is more advantageous as it is implemented at the watershed scale, thus reducing discrepancy problems that are common in point to pixel validations, as well as, being evaluated in a practical rainfall-runoff modeling application.
Many studies have been conducted to evaluate and validate SRFE products over different areas. However, none of these studies have evaluated the MERGE precipitation product by exposing it to well-stablished robust validation methods applied to reportedly problematic regions (i.e., tropical/sub-tropical, mountainous, and anthropogenic watersheds). In this context, the main objective of the present study is to assess the MERGE's applicability towards hydrological applications by: (i) a watershed scale comparison between the TMPA 3B42V7 and MERGE products by means of widely used ground truthing methods; and (ii) a hydrological validation of the MERGE product over four contrasting, highly anthropogenic, watersheds in the Upper Paraíba do Sul River Basin (UPSRB), Southeastern Brazil. The present study stands out as the first hydrological validation of MERGE product, assessing its potential as an input for the large scale distributed hydrological model MGB-IPH.

Study Area
With a total extension of approximately 1150 km, the Paraíba do Sul River is formed by the junctions of the Paraitinga and Paraíbuna rivers. The Paraíba do Sul River Basin (PSRB) is a highly anthropogenic river basins, covering a total area of around 57,000 km 2 , corresponding to 63% of the State of Rio de Janeiro's total area, 5% of São Paulo, and 4% of Minas Gerais [29], encompassing a total population of 5 million people [30]. In a long-term hydrological analysis of the Paraíba do Sul River (i.e., from 1920 to 2004), Marengo and Alves [31] observed a significant decrease in streamflow mainly after 1955. This decreased showed a strong correlation with the construction of large hydraulic structures, as well as periods of urbanization and economic development, which reflected in significant interventions in the natural streamflow in terms of large dams, increase in water withdraws, and changes in land cover. To evaluate the PSRB most anthropogenic areas, four cross section in the UPSRB were considered, including the Santa Branca dam (4907 km 2 ) (cross Section 1), Jaguari dam (1313 km 2 ) (cross Section 2), Pindamonhangaba (3354 km 2 ) (cross Section 3), and the Funil hydroelectric power plant (3679 km 2 ) (cross Section 4) ( Figure 1).
Water 2020, 12, x FOR PEER REVIEW 3 of 22 the latter is more advantageous as it is implemented at the watershed scale, thus reducing discrepancy problems that are common in point to pixel validations, as well as, being evaluated in a practical rainfall-runoff modeling application. Many studies have been conducted to evaluate and validate SRFE products over different areas. However, none of these studies have evaluated the MERGE precipitation product by exposing it to well-stablished robust validation methods applied to reportedly problematic regions (i.e., tropical/sub-tropical, mountainous, and anthropogenic watersheds). In this context, the main objective of the present study is to assess the MERGE's applicability towards hydrological applications by: (i) a watershed scale comparison between the TMPA 3B42V7 and MERGE products by means of widely used ground truthing methods; and (ii) a hydrological validation of the MERGE product over four contrasting, highly anthropogenic, watersheds in the Upper Paraíba do Sul River Basin (UPSRB), Southeastern Brazil. The present study stands out as the first hydrological validation of MERGE product, assessing its potential as an input for the large scale distributed hydrological model MGB-IPH.

Study Area
With a total extension of approximately 1150 km, the Paraíba do Sul River is formed by the junctions of the Paraitinga and Paraíbuna rivers. The Paraíba do Sul River Basin (PSRB) is a highly anthropogenic river basins, covering a total area of around 57,000 km 2 , corresponding to 63% of the State of Rio de Janeiro's total area, 5% of São Paulo, and 4% of Minas Gerais [29], encompassing a total population of 5 million people [30]. In a long-term hydrological analysis of the Paraíba do Sul River (i.e., from 1920 to 2004), Marengo and Alves [31] observed a significant decrease in streamflow mainly after 1955. This decreased showed a strong correlation with the construction of large hydraulic structures, as well as periods of urbanization and economic development, which reflected in significant interventions in the natural streamflow in terms of large dams, increase in water withdraws, and changes in land cover. To evaluate the PSRB most anthropogenic areas, four cross section in the UPSRB were considered, including the Santa Branca dam (4907 km 2 ) (cross Section 1), Jaguari dam (1313 km 2 ) (cross Section 2), Pindamonhangaba (3354 km 2 ) (cross Section 3), and the Funil hydroelectric power plant (3679 km 2 ) (cross Section 4) ( Figure 1).  According to the Köppen's climate classification, the UPSRB may be divided into Cfa (i.e., humid, oceanic climate with dry season and with hot summer) in the upstream regions, and Cwa (i.e., humid, with dry winter and hot summer) and Cfb (i.e., humid, oceanic climate with dry season and with temperate summer) downstream [32]. Entisols and Inceptisols are predominant in the most mountainous region, whereas Oxisols and Ultisols are found in less complex terrains, fluvic soils are found throughout the main riverbank [33]. The major land cover is pasture (60.3%) followed by forested (34.8%) and urbanized (2.8%) areas [34]. Watershed 1 is mostly covered by pastures (57.9%), with significant portions of forests (38.7%) and water bodies (3.4%). Watershed 2 is also mainly covered by pasture (49.2%), forests (49.7%), and water bodies (3.2%), with a small fraction of urban areas (0.9%). The downstream watershed 3 is composed by 63.1% of pastures, 28.7% of forests, and 7.6% or urban areas, followed by small portions of water bodies, rock outcrops, and agriculture, with 0.4%, 0.1%, and 0.1%, respectively. Lastly, watershed 4 is covered by 64.9% of pastures, 31.0% of forests, 2.6% of urbanized areas, 0.8% of agriculture, 0.6% of waterbodies, and 0.1% of rock outcrops [34].

Precipitation Data Sources and Preparation
The data collected from 115 conventional rain gauges made available by the Brazilian National Water Agency (ANA) were used as reference throughout the present study [35]. These data represent the accumulated daily precipitation (mm day −1 ) at 10 a.m. (UTC) [22] and was collected for the whole period considered, i.e., from 01/01/2001 to 12/31/2012. It should be mentioned that this period was selected because it comprehends the largest number of rain gauges with no more than 5% of missing data, which were filled based on the inverse square distance approach described by Viessman and Lewis [36]. It should be mentioned that the MGB-IPH model uses a precipitation interpolation approach based on the nearest interpolation method. Therefore, to keep a certain consistency between the evaluation of satellite precipitation data and their hydrologic validation, the Thiessen polygon interpolation method was adopted for obtaining the weighted daily averages for each watershed.
The TMPA 3B42V7 released in May 2012 has several advantages when compared to its predecessors, including a new TIR dataset and additional satellite inputs, a latitude band calibration system, and a new gauge analysis based on the Global Precipitation Climatology Centre (GPCC) data [21,27]. These advantages result in a product with higher correlation coefficient (r), lower relative bias and root mean square error (RMSE) against rain gauge data, especially in mountainous regions [27,37]. To spatially rescale the TMPA 3B42V7 data to the watershed level, as implemented for the ANA rain gauge network, 3-hourly weighted averages had to be calculated for each watershed. Every day a total of eight TMPA 3B42V7 files are made available from 10:30 p.m. (UTC, the day before) to 10:29 p.m. (UTC). These data do not temporally match the ANA precipitation data and thus had to be rearranged. Following the routine proposed by Reis et al. [22] the TMPA 3-h data was grouped from 10:30 a.m. (UTC, the day before) to 10:29 a.m. (UTC), minimizing the initial shift between the daily TMPA SRFE and the ANA network of 11 h and 30 min to only 30 min.
Finally, the MERGE data, derived from the TMPA 3B42RT SRFE and corrected for 1500 rain gauge stations in South America [25], were evaluated against the rain gauge observations. The MERGE data were also rescaled to the watershed level based on the pixel area weighted average to be compared to the gauge observations. According to Rozante et al. [25] the MERGE data were built following five main steps: (i) the TRMM grid cells whose location matches those of gauge station were identified; (ii) these grid cells had their precipitation values switched to those of gauge stations; (iii) the precipitation data from the two adjacent rows and columns were erased; (iv) the resulting map was then interpolated using the Barnes objective analysis method; and (v) the precipitation estimates were cross-validated. It should be mentioned that both MERGE, provided by the Center for Weather Forecast and Climatic Studies (CPTEC) [38], and TRMM [39] data were considered for the same period (i.e., 01/01/2001 to 12/31/2012) as the rain gauge data.

Statistical Evaluation
Once the precipitation data has been prepared, they should have their accuracy assessed before having their impacts evaluated as model inputs [37]. In addition to the visual interpretation, the precipitation data were evaluated based on several extensively used validation statistical indices [3,9,12]. The following statistical indices were selected based on the recommendations Gilewski and Nawalany [10], since they allow performing a multi-aspect analysis of the results. First, the correlation coefficient (r) was used to assess the de degree of linear agreement between observations. The r is particularly used for evaluating the timing of observed events [10]. The value of r may vary from −1 to +1, where the closest to +1 or −1 the higher the positive or negative correlation, respectively. Complementarily, the RMSE was used to measure the average error magnitude between the satellite-based and the gauge data. The relative bias (RBias), which corresponds to the absolute bias divided by the amount of rain gauge precipitation depth, was chosen since it is not dependent on the amount of precipitation. The RBias is recommended for investigating positive or negative tendency in estimated values [10], this index in particularly important for validating satellite-based estimates due to the recurrent presence of systematic errors in SRFE. Positive RBias percentages imply overestimation of precipitation by satellite instruments whereas negative RBias indicate its underestimation. The equations that represent these indices are [9,37]: where P S i is the satellite precipitation estimate and P O i is the gauge station measurement for day i from a total time period of N days. P S and P O are the long-term precipitation averages for the satellite and rain gauge estimates, respectively. As adopted by Gilewski and Nawalany [10] correlations may be considered "very good" if > 0.85, "good" if > 0.70 and ≤ 0.85, "satisfactory" if > 0.60 and ≤ 0.70, "acceptable" if > 0.40 and ≤ 0.60 and "unsatisfactory" if ≤ 0.40; while RBias is considered "acceptable" is equal or lower than 20% and "unacceptable" if higher than 20%. To classify the RMSE estimates the relative RMSE was used, where values lower than 50% represent reliable SRFE estimates [40].

Hydrologic Data Sources and Preparation
The distributed rainfall-runoff model MGB-IPH, developed for large, tropical/sub-tropical basins, was selected to hydrologically validate the SRFE. The MGB-IPH [41,42] has been extensively used in developing countries due to its adequacy for scarcely monitored areas, capacity to satisfactorily simulate runoff with the calibration of a reduced number of parameters, and the fact it runs with easily obtained and widely available inputs. In addition to the precipitation data described in the previous section, the MGB-IPH requires elevation, land cover, soil, daily runoff, and monthly meteorological data. The data used, as well as their main characteristics, are presented in Table 1.
The data presented in Table 1 are also displayed in Figure 2. The four streamflow cross-sections ( Figure 1) were defined based on streamflow data availability and watershed characteristics. Due to the large dams found in cross-sections 1 (Santa Branca dam) and 2 (Jaguari dam) (Figure 1), the observed streamflows could not be used in the model, since it does not reflect the watersheds' natural hydrological regime. Instead, indirect streamflow estimates called naturalized streamflows were used. According to Guilhon et al. [43] naturalized streamflows are those that would occur if there were no anthropogenic activities in the watershed (e.g., reservoirs, irrigation, consumptive water withdraws, etc.). These estimates are made available by the National Electric System Operator (ONS) [44], which coordinates the generation and transmission of electric energy in Brazil, and are derived from an approach that incorporates the simulations from 96 stochastic models [45]. The remaining streamflow data were obtained from the ANA's HidroWeb platform [34].
The land cover data used in the present study were provided by the MapBiomas Project. The MapBiomas Project is a supra-institutional initiative for developing a reliable, long-term, land cover historical series for Brazil based on Landsat family satellite observations. The year of 2006 was taken as reference [34] and regroup to fulfill the model implementation recommendations [46]. Analogously, the 1:250,000 soil map made available by the Brazilian Institute of Geography and Statistics (IBGE) [33], was reclassified into the drainage conditions proposed by Medeiros et al. [46]. The reclassified land cover and soil maps are presented in Figure 2a

MGB-IPH
The MGB-IPH model is structured based on unit-catchments which are composed by a singleriver reach, extracted from the DEM, and its respective floodplain [42]. According to these authors, each unit-catchment is composed by fractions of hydrological response units (HRUs) (i.e., areas of equal soil type and land cover). The evapotranspiration for each HRU is calculated based on the Penman-Monteith equation and the resulting unit-catchment water budget is computed as the sum of each HRU contribution. Linear reservoirs are used to propagate the groundwater, sub-surface, and

MGB-IPH
The MGB-IPH model is structured based on unit-catchments which are composed by a single-river reach, extracted from the DEM, and its respective floodplain [42]. According to these authors, each unit-catchment is composed by fractions of hydrological response units (HRUs) (i.e., areas of equal soil type and land cover). The evapotranspiration for each HRU is calculated based on the Penman-Monteith equation and the resulting unit-catchment water budget is computed as the sum of each HRU contribution. Linear reservoirs are used to propagate the groundwater, sub-surface, and surface outflows to the main river. Once these individual outflows reach the main river, they are summed and routed to through the river system using the Muskigum-Cunge method. The soil water balance was computed independently for each HRU in a unit-catchment [41,42] as: where k, i, and j are the time, unit-catchment, and HRU indexes, respectively; ∆t is the length of the time step (daily); W k i,j (mm) is the water storage in the soil layer at the end of the kth time step of the jth HRU in the ith unit-catchment; W k−1 i,j (mm) is the water storage in the soil layer at the previous time step; P i (mm day −1 ) is the amount of precipitation that reaches the soil surface and ET i,j (mm day −1 ), Dsup i,j (mm day −1 ), Dint i,j (mm day −1 ), and Dbas i,j (mm day −1 ) are the water loses for evapotranspiration, surface, sub-surface, and groundwater outflows, respectively. The variables W k i,j and P i are known for each time step, the remaining variables are calculated according to the soil water storage at the beginning of the time step and the following model parameters [41]: Where Wm j (mm) and b j (-) are calibrated parameters referent to the maximum water storage in the upper soil layer, and the adjustment of statistical distribution of soil water storage capacity, respectively [41]. Sub-surface flow is computed based on a modified version of the Brooks and Corey non-saturated hydraulic conductivity equation [41,42]: where Kint j (mm day −1 ) is a calibrated parameter that controls the sub-surface flows when the soil is saturated; Wz j (mm) is the lower limit from which there is no sub-surface flow; and λ j is the soil porosity index. The groundwater flow is estimated based on a simple linear relation between soil water storage and maximum soil water storage [41,42]: The calibration of MGB-IPH model parameters can be automatically executed using the Multiple-Objective Complex Evolution-University of Arizona optimization algorithm (MOCOM-UA). In the present study three objective functions were used for automatic model calibration, the Nash-Sutcliffe efficiency coefficient (NS) (indicated to the fit of maximum streamflows), and its logarithmic version (NS log ) (related to the fit of recession periods [42] and the relative error of the total volume (∆V) [41], which reflect the effects of long-term bias: where t is the respective daily time step; n is the total number of time steps; Q obs are the observed streamflows and Q sim are the simulated ones. NS values range from −∞ to 1, whereas NS between 0 and 1 imply some degree of model predictably [48]. Analogously, NS log values close to 1 also mean a better fit between observed and simulated streamflows. However, this metric is more sensitive to recession and dry period and thus it is indicated for a better fit of minimum streamflows [42]. According to Moriasi et al. [48] the resulting values for the aforementioned statistics may be classified as "very good" if >0.75, "good" if >0.65 and ≤0.75, "satisfactory" if >0.50 and ≤0.65, and "unsatisfactory" if ≤0.50. Negative and positive ∆V values represent streamflow underestimation and overestimation, respectively. According to Van Liew et al. [49] absolute ∆V ≤ 10% can be classified as "very good", 15% ≤ ∆V < ±10% are classified as "good", 25% ≤ ∆V < 15% are classified "satisfactory", and absolute ∆V > 25% should be classified as "unsatisfactory".

Statistical Validation of TMPA 3b42v7 and MERGE Products
Tong et al. [12] suggest that a preliminary visual interpretation should be performed prior to the precipitation statistical validation. Among the possible graphic data representations, the cumulative precipitation depths evaluation stands out as it allows the identification of data bias, seasonality, and overall errors. Figure 3 represents the cumulative precipitation depths for both TMPA 3B42V7 and MERGE products in comparison to the observed rain gauge daily measurements for all UPSRB watersheds.
Water 2020, 12, x FOR PEER REVIEW 9 of 22 a better fit between observed and simulated streamflows. However, this metric is more sensitive to recession and dry period and thus it is indicated for a better fit of minimum streamflows [42]. According to Moriasi et al. [48] the resulting values for the aforementioned statistics may be classified as "very good" if > 0.75, "good" if > 0.65 and ≤ 0.75, "satisfactory" if > 0.50 and ≤ 0.65, and "unsatisfactory" if ≤ 0.50. Negative and positive ∆ values represent streamflow underestimation and overestimation, respectively. According to Van Liew et al. [49] absolute ∆ ≤ 10% can be classified as "very good", 15% ≤ ∆ < ±10% are classified as "good", 25% ≤ ∆ < 15% are classified "satisfactory", and absolute ∆ > 25% should be classified as "unsatisfactory".

Statistical Validation of TMPA 3b42v7 and MERGE Products
Tong et al. [12] suggest that a preliminary visual interpretation should be performed prior to the precipitation statistical validation. Among the possible graphic data representations, the cumulative precipitation depths evaluation stands out as it allows the identification of data bias, seasonality, and overall errors. Figure 3 represents the cumulative precipitation depths for both TMPA 3B42V7 and MERGE products in comparison to the observed rain gauge daily measurements for all UPSRB watersheds. As depicted in Figure 3a a better agreement between TMPA 3B42V7-based precipitation estimates and gauge measurements was obtained for watershed 1 when compared to watersheds 2, 3, and 4 (Figure 3b-d). The overestimations made by TMPA 3B42V7 in the lowland valley areas after the 12-year period reached thousands of millimeters, thus compromising eventual hydrological data As depicted in Figure 3a a better agreement between TMPA 3B42V7-based precipitation estimates and gauge measurements was obtained for watershed 1 when compared to watersheds 2, 3, and 4 ( Figure 3b-d). The overestimations made by TMPA 3B42V7 in the lowland valley areas after the 12-year period reached thousands of millimeters, thus compromising eventual hydrological data applications, especially those related to water balance calculation. Similarly, Falck et al. [50] also observed large cumulative overestimations of TMPA 3B42 estimates in 19 watersheds located within the Tocantins-Araguaia river basin. The same behavior is not found for MERGE precipitations, where close fits were observed for all watersheds. Thiemig et al. [20], in a study applied to four watersheds in Africa, also found a better performance of merged precipitation products (i.e., remote sensing-based precipitation products corrected with gauge observations) in lowland catchments when compared to those that do not ingest any ground observations. The authors argue that the adjustment of SRFE, even when based on a small number of ground observations, may significantly improve the intrinsic data quality. In this context, one can infer that MERGE products, which are adjusted from 1500 rain gauges mainly located in Southeastern and Northeastern Brazil (see Rozante et al. [25]), have the potential to provide SRFEs very close to the observed rainfall given the high density of rain gauges in certain areas and the fact it is built in a way that favors ground observations.
The seasonal dependence of errors may also be identified in Figure 3, where cumulative precipitation depths tend to be parallel during dry seasons (from April to September) and steep during wet seasons (from October to March). In addition, one may see that differences tend to be larger during wet season as the curve slopes are usually steeper for such periods. A clearer representation for the seasonality of precipitation and its estimation errors is depicted in Figure 4. in Africa, also found a better performance of merged precipitation products (i.e., remote sensingbased precipitation products corrected with gauge observations) in lowland catchments when compared to those that do not ingest any ground observations. The authors argue that the adjustment of SRFE, even when based on a small number of ground observations, may significantly improve the intrinsic data quality. In this context, one can infer that MERGE products, which are adjusted from 1500 rain gauges mainly located in Southeastern and Northeastern Brazil (see Rozante et al. [25]), have the potential to provide SRFEs very close to the observed rainfall given the high density of rain gauges in certain areas and the fact it is built in a way that favors ground observations. The seasonal dependence of errors may also be identified in Figure 3, where cumulative precipitation depths tend to be parallel during dry seasons (from April to September) and steep during wet seasons (from October to March). In addition, one may see that differences tend to be larger during wet season as the curve slopes are usually steeper for such periods. A clearer representation for the seasonality of precipitation and its estimation errors is depicted in Figure 4.  Figure 4 represents the mean monthly precipitation for all four watersheds. The results presented in Figure 4 corroborates the conclusions made for Figure 3. The strong seasonality, characteristic of tropical/sub-tropical regions, is also evident for the study area. In addition, the TMPA 3B42V7 largely overestimates precipitation, especially during the wet season. The MERGE, on the other hand, shows closer fit to the observed mean monthly precipitation for the whole area. As stated by Maggioni et al. [21], seasonality and topography play an important role in SRFE performance. For instance, regions characterized by complex terrains and sharp precipitation gradients may result in weak rain detection and large errors since SRFE errors are magnitude  Figure 4 represents the mean monthly precipitation for all four watersheds. The results presented in Figure 4 corroborates the conclusions made for Figure 3. The strong seasonality, characteristic of tropical/sub-tropical regions, is also evident for the study area. In addition, the TMPA 3B42V7 largely overestimates precipitation, especially during the wet season. The MERGE, on the other hand, shows closer fit to the observed mean monthly precipitation for the whole area. As stated by Maggioni et al. [21], seasonality and topography play an important role in SRFE performance. For instance, regions characterized by complex terrains and sharp precipitation gradients may result in weak rain detection and large errors since SRFE errors are magnitude dependent [21]. This characteristic along with the strong seasonality observed for UPSRB may result in the large errors in wet seasons depicted in Figure 4.
In addition to the visual interpretation of precipitation data, a more robust evaluation of SRFE was developed based on statistical validation methods. The statistical validation of SRFE in the present study followed the approach adopted by Reis et al. [22], Tong et al. [12], and Shukla et al. [3].    Figures 5a and 6a). The largest difference between products was found for the downstream watershed 4 where TMPA 3B42V7 r is equal to 0.72 and that of MERGE is equal to 0.78 (Figures 5d and 6d). All correlations found for TMPA 3B42V7 and MERGE are significant for a p-value < 0.001. Similarly, the daily errors in terms of RMSE are substantially smaller for MERGE in comparison to TMPA 3B42V7 for all watersheds. In fact, the average RMSE for TMPA 3B42V7 is equivalent to 6.91 mm. The average RMSE for MERGE, on the other hand, is as low as 4.96 mm. When analyzed in terms of relative RMSE TMPA 3B42V7 products resulted in values 54.25%, 95.01%, 76.62%, 77.18% higher than the observed precipitation average for watersheds 1, 2, 3, and 4, respectively. However, relative RMSE for MERGE was 20.15%, 45.71%, 17.98%, and 19.85% higher than the observed precipitation average for watersheds 1, 2, 3, and 4, respectively. As already presented in Figure 3, the bias of TMPA 3B42V7 daily estimates is considerably superior than that of MERGE ( Figures 5 and 6). RBias for TMPA 3B42V7 may reach values of up to 28% in the lowland downstream sections, whereas MERGE's may make underestimations of around −6% and overestimations of on average no more than 5%.  In a study applied to lower latitudes watersheds, Dinku et al. [17] identified that TMPA 3B42 multi-sensors poorly estimate precipitations from warm orographic rain clouds. According to the aforementioned authors, as the TIR sensors are used to estimate precipitation from the inverse relation between cloud top temperature and the amount of precipitation, warm convective clouds, especially in costal and mountainous regions, are likely underestimated. In addition, as tropical orographic clouds normally result in heavy rainfall amounts without the formation of much ice aloft, even the more accurate PMW SRFE which are derived from the scattering of PMW by ice particles, are miscalculated. However, the statistical behavior described in Figures 5 and 6 is contrary to that reported by Dinku et al. [17] for equatorial regions. In fact, the patterns observed in the present study are similar to those found by Reis et al. [22] for mountainous sub-tropical watersheds for Minas Gerais State-Brazil. Therefore, one may infer that the rainfall generating mechanisms of UPSRB are more closely related to those of higher latitude sub-tropical regions rather than those of tropical lower latitudes. Reis et al. [22] found r values of 0.7 and RMSE of 6.6 (mm day −1 ) for TMPA 3B42V7 in an 860 km 2 watershed located in Southern Minas Gerais. The results obtained by these authors for the TMPA 3B42V7 are superior than the ones found in the present study. However, the statistics found for MERGE products in the present study outperforms the ones found by Reis et al. [22].
Given the proven superiority of MERGE for all analysis performed, this product was adopted as MGB-IPH model input to evaluate its SRFE capability in hydrological applications. The runoff simulations derived from MERGE were compared to those derived from the rain gauge network as well as observed streamflows. The results for the objective functions, simulated hydrograph characteristics, and the plausibility of calibrated model parameters were used as evaluation criteria for SRFE applicability.

Hydrologic Validation of MERGE Products
As stated in Section 2.3.2, the MGB-IPH was automatically calibrated using the MOCOM-UA algorithm applied to the objective functions NS, NS log , and ∆V in order to take peak flows, recession, and long-term bias into account. The watersheds were calibrated individually following the topological order of watersheds 1 to 4, starting with reference parameter values [46] and respecting thresholds used in the literature [51,52]. The MGB-IPH is calibrated based on 7 parameters, where 4 are HRU dependent (i.e., Wm, b, Kint, and Kbas) and 3 are fixed for the entire watershed (i.e., Cs, Ci, and Cb). It is worthwhile to mention that the percentage of each HRU in the watershed plays an important role on model calibration as the larger the presence of an HRU the more influential its parameters potentially are. Figure 7 depicts the calibrated parameters for each HRU in each watershed for both rain gauge and MERGE-based simulations. For simplicity reasons only HRUs covering more than 0.5% of the total watershed area are considered. Table 2 presents the percentage of each HRU over watersheds 1 to 4. (i.e., , , and ). It is worthwhile to mention that the percentage of each HRU in the watershed plays an important role on model calibration as the larger the presence of an HRU the more influential its parameters potentially are. Figure 7 depicts the calibrated parameters for each HRU in each watershed for both rain gauge and MERGE-based simulations. For simplicity reasons only HRUs covering more than 0.5% of the total watershed area are considered. Table 2 presents the percentage of each HRU over watersheds 1 to 4.    The calibrated, HRU dependent, Wm parameter (mm) is reportedly the most sensitive in MGB-IPH model [51]. The maximum upper soil water storage capacity parameter (Wm) defines the amount of water available for evapotranspiration as well as being related to soil drainage properties. Parameter b, on the other hand, describes the statistical distribution of water storage capacity, where b equals to zero correspond to the whole HRU area with a water storage capacity of Wm, while b > 0 means that parts of the HRU will generate runoff before the Wm is reached [41]. Parameters Kint and Kbas (mm day −1 ) refer to saturated soil sub-surface and groundwater drainage, respectively. Finally, parameters Cs, Ci, and Cb are non-dimensional values for surface, sub-surface, and groundwater flow retention time. A detailed description of MGB-IPH parameters are found in Collischonn et al. [41,51].
Based on Figure 7 it is possible to affirm that rain gauge and MERGE-based simulations have similar magnitudes, indicating that although adjustments were necessary, the MERGE-based simulation maintained the overall reference calibration characteristics. In addition, consistent values were obtained for parameter Wm as maximum upper soil water storage capacity was calibrated higher for well drained soil when compared to poorly drained ones (Figure 7a,b). Furthermore, higher Wm values are seen for the mountainous watersheds 1 and 2. This characteristic associated with the fact that most of these watersheds are covered by expectedly higher Wm HRUs (i.e., FWD and PWD) indicate that high soil water storage capacity may in fact occur in the upstream watersheds. However, it should be considered that the large reservoirs in such watersheds (Table 2), which attenuate the propagation of floods, may have had an impact on the high calibrated Wm. Moreover, these cross-sections were calibrated for naturalized streamflow estimates which by definition contain errors that are not present in direct natural streamflow measurements.
Another important calibration feature that reinforces the consistency of MERGE-based simulations is that as with the rain gauge-based simulations, higher b values were calibrated for shallower, poorly drained, and anthropic HRUs (e.g., PPD, FPD, U, and APD) (Figure 7c,d, Table 2). According to Collischonn et al. [51] higher b values result in an increase in the number flood peaks, which is a characteristic of not only urbanized regions but also compacted agricultural lands, mechanized cultivated forests, and hydromorphic soils. Higher calibrated values for the aforementioned HRUs were also observed for parameters Kint and Kbas for both precipitation data sources (Figure 7e-h). This characteristic suggests that the most vulnerable and anthropogenic soils (e.g., PPD, FPD, U, and APD), which cover vast areas of all watersheds (Figure 2d, Table 2), are the ones that proportionally most contribute to the streamflow maintenance in terms of sub-surface and groundwater flows. Finally, the discussion of Cs, Ci, and Cb parameters (Figure 7i,j) is hampered by the fact they are HRU independent, thus being influenced by the overall resulting watersheds response. However, the fact they have similar calibrated values both precipitation data sources may indicate the calibrations are consistent to the hydrology of UPSRB.
During preliminary streamflow simulations it was noticed that simulated and observed streamflows begun to converge after a period of approximately 2 years. Viola et al. [53] attributed this characteristic to the impossibility of stablishing specific initial values for soil water storage, thus requiring a certain period of time so the model can overcome the influence of initial conditions. Such behavior may be related to the watershed memory, which is a watershed property derived from a set of climatological forcing and physiographic attributes (e.g., vegetation, slope, soil depth, etc.) that can influence present and future hydrologic responses [54], i.e., the influence of past precipitation (not taken into account) over current streamflows. To avoid misrepresenting simulated streamflows as well as reducing the current period analyzed (1 January 2001 to 31 December 2010), a 2 years period from 1 January 1999 to 31 December 2000 was added for model warm-up. The Split Sample Test method [55] was used for streamflow validation, in which the analyzed period is divided into calibration and independent validation intervals. Individual calibrations were developed for both the reference simulation based on rain gauge data and MERGE derived one as it is the most recommended approach since hydrological performance is always higher when the model is individually calibrated for different precipitation sources [20].
It is worthwhile to mention that a particular routine had to be developed to incorporate watersheds 1 and 2 due to their anthropogenic aspects, i.e., dammed main river. The simulations for watersheds 1 and 2, which are based on naturalized streamflows, do not reflect the flows that are actually propagated downstream, instead they are simulations of what would have happened if there were no anthropogenic interventions. Therefore, the downstream propagation of naturalized streamflow-based simulations would result in unrealistic estimates. Thus, substitutions of naturalized streamflow-based simulations by total dam daily discharges [43] had to be made to preserve the anthropogenic hydrological aspects of UPSRB. Naturalized streamflows have been vastly used in hydrological modeling studies worldwide. Foy et al. [56] successfully applied the Soil and Water Assessment Tool (SWAT) model to four snow-dominated, mountainous basins in Colorado-USA based on naturalized streamflows over a 15 years period. Weedon et al. [57] evaluated nine distributed hydrological models from 1963-2001 for the Thames basin using naturalized streamflows. The authors identified a variety of issues that could be addressed to improve specific models' performance. Likewise, several studies based on naturalized streamflows have been developed for Brazil. Falck et al. [50] evaluated the propagation of satellite precipitation uncertainties through a hydrological model employing the naturalized streamflows of the Tocantins-Araguaia river basin. Lastly, Oliveira et al. [45] assessed the impacts of climate change on the hydropower potential of Furnas hydropower system in Minas Gerais-Brazil using the SWAT model coupled with naturalized streamflow estimates.
The simulated streamflows based on the reference rain gauge observations and MERGE products and their respective NS, NS log , and ∆V, for both calibration and a 2 years validation period, are presented in Figure 8.
In the previous section, the validation analysis for the MERGE product applied to watershed 1 had provided the closest cumulative precipitation depths, an overall good representation of precipitation seasonality, as well as high r values and low RMSE and RBias. However, when used in the MGB-IPH model it resulted in "unsatisfactory" simulations for the calibration period, with "satisfactory" results during validation (Figure 8a). Similarly, low NS, NS log , and ∆V values were also obtained for the rain gauge-based simulations. Therefore, these poor results should not be attributed to precipitation uncertainties since the MERGE product has presented a good fit with rain gauge observations which are densely available for watershed 1 (Figure 2c). Even though watershed 1 is sparsely populated and mostly covered by pastures and forested areas, it can be considered to be highly anthropogenic as it encompasses two large reservoirs (i.e., Paraíbuna upstream and Santa Branca downstream) covering more than 3% of the total watershed area. The influences these large reservoirs have on the hydrologic cycle, along with the error propagation and limitation of naturalized streamflows in such cascade of dams, may have restricted the models' ability to simulate the hydrological outputs. This hypothesis is corroborated by the fact that simulations based on rain gauge precipitation obtained for watershed 2 (Figure 8b), which has a smaller single reservoir, resulted in "good" to "very good" calibrations. However, a sharp decrease in all objective functions was observed for the validation period of watershed 2 indicating a loss of predictive capacity possibly due to naturalized streamflow uncertainty and lack of hydrological model representativity. In the previous section, the validation analysis for the MERGE product applied to watershed 1 had provided the closest cumulative precipitation depths, an overall good representation of precipitation seasonality, as well as high values and low and . However, when used in the MGB-IPH model it resulted in "unsatisfactory" simulations for the calibration period, with "satisfactory" results during validation (Figure 8a). Similarly, low , , and ∆ values were also obtained for the rain gauge-based simulations. Therefore, these poor results should not be attributed to precipitation uncertainties since the MERGE product has presented a good fit with rain For the downstream watershed 3 "very good" fits were obtained for all the objective functions, indicating a very good representations of peak flows, hydrograph recessions, and overall water output volume (Figure 8c) from both rain gauge and MERGE-based simulations. A drop to "satisfactory" and "good" representations was observed during validation, thus implying that the unpredictability of upstream discharges may jeopardize the model's downstream predictive capacity. "Good" estimates of maximum streamflow and ∆V for the validation period are seen for watershed 4, nevertheless "unsatisfactory" NS log are also evident (Figure 8d). The small positive ∆V of watershed 4 contradicts the actual bias of MERGE products. According to Figures 3-6, the SRFE for MERGE in fact underestimates precipitation in about −6.12%. This clear divergency between the actual SRFE characteristics (Section 3.1) and its corresponding simulations (Section 3.2), along with the fact the modeling results for MERGE and the rain gauge network are relatively similar, reinforces the idea that the large anthropogenic interventions found in watersheds 1 and 2 impose a much larger limitation to a proper hydrological modeling in the region than the SRFE source.
The results observed for the present study corroborates what has been described in recent studies worldwide. Tong et al. [12] evaluated SRFE over the Nanliu River in China using the Xinanjiang hydrological model. These authors obtained "very good" simulations in terms of NS when using rain gauges as model input. The authors then calibrated the model for SRFE and obtained "satisfactory" simulations that although inferior, preserved the basic features of observed streamflows. Like the present study, Bui et al. [58] also observed a drop in the objective functions during calibration and validation periods using the HBV model with SRFE as input, for four watersheds located in Vietnam, South Korea, and Japan. Zhang et al. [23] compared SRFE (e.g., TMPA 3B42V6 and V7 and GPM) using two hydrological models (i.e., Xinanjiang and VIC) in the Ganjiang River-China. The authors observed significant improvement of TMPA 3B42V7 over its previous version. Moreover, better simulations were obtained when using the VIC model. The result of Zhang et al. [23] are similar to those of watershed 4, where "good" simulations were achieved with MERGE for both calibration and validation periods.
In addition to the daily streamflows, cumulative frequency curves can be used for evaluating model's performance since they are an important tool for understanding the frequency and magnitude of extreme events. Therefore, a model capable of reproducing such characteristics has the potential for being used in water resources management, hydrological hazards mitigation, and hydrological engineering. Figure 9 presents the cumulative frequency curves for the observed and simulated streamflows.  The cumulative frequency curve for watersheds 1 (Figure 9a) clearly depicts the influence of the , , and ∆ values presented in Figure 8a. The low ∆ and observed for watershed 1 result in largely different streamflow magnitudes for almost all exceedance probabilities, i.e., unrealistic streamflow estimates for different return periods. Moreover, the large ∆ values for watershed 1 result in notable streamflow overestimations as suggested in Figure 9a. Nevertheless, close fits were obtained between the observed and simulated cumulative frequency curves for watersheds 2 and 3 (Figure 9b,c). The ability to represent streamflow frequency and magnitude observed for the MERGE-based simulations for watersheds 2 and 3 imply that although there are The cumulative frequency curve for watersheds 1 (Figure 9a) clearly depicts the influence of the NS, NS log , and ∆V values presented in Figure 8a. The low ∆V and NS log observed for watershed 1 result in largely different streamflow magnitudes for almost all exceedance probabilities, i.e., unrealistic streamflow estimates for different return periods. Moreover, the large ∆V values for watershed 1 result in notable streamflow overestimations as suggested in Figure 9a. Nevertheless, close fits were obtained between the observed and simulated cumulative frequency curves for watersheds 2 and 3 (Figure 9b,c). The ability to represent streamflow frequency and magnitude observed for the MERGE-based simulations for watersheds 2 and 3 imply that although there are clear limitations for SRFE in the highly anthropogenic watersheds of UPSRB, it still has important strengths for a water sources management and engineering. Finally, close fits for watershed 4 (Figure 9d) were only possible for low exceedance probabilities, i.e., higher return periods. This characteristic is expected since NS, which is related to maximum flows, for watersheds 4 are much higher than NS log , which is related to minimum flows, thus reflecting a better fit of flood events than recessions.

Conclusions
In the present study the MERGE satellite rainfall product was hydrologically validated in four contrasting watersheds in the UPSRB for a period from January 2001 to December 2012. This validation was based on the direct comparison of MERGE with TMPA 3B42V7 and rain gauge data at the watershed scale and the hydrological validation of streamflow simulations using the MGB-IPH model.
The MERGE daily product could capture the overall temporal and spatial precipitation aspects. Based on the visual interpretation of data one may conclude that MERGE is superior than TMPA 3B42V7 for the UPSRB by means of better representing precipitation seasonality as well as being much more faithful to the observed precipitation accumulated amounts. When statistically evaluated MERGE also presented superior features. Higher correlations were observed for MERGE when compared to TMPA 3B42V7 products. However, both products resulted in "good" correlations for watersheds 1, 3, and 4, and "satisfactory" correlation for watershed 2. In addition, lower root mean square errors were obtained from MERGE for all watersheds evaluated. Relative RMSE are classified as "unreliable" for TMPA 3B42V7 when applied to the four watersheds under analysis, while MERGE-based relative RMSE classified as "reliable". Lastly, much lower biases were observed for MERGE products. In fact, for the TMPA 3B42V7 both watersheds 3 and 4 presented "unacceptable" RBias. MERGE-based precipitation, on the other hand, resulted in "acceptable" relative bias for all watersheds, thus demonstrating the clear superiority of MERGE over TMPA 3B42V7.
The streamflow simulation from MERGE was able to capture the overall hydrological regimes in the watersheds of UPSRB. Unsatisfactory NS, NS log , and ∆V results were obtained for watersheds 1. However, such poor representations may not be attributed to the precipitation input (i.e., MERGE) since the reference simulations, based on rain gauge data, also was not satisfactory. Thus, one may infer that large reservoir found for this watershed, along with the fact simulation are made based on naturalized streamflow estimates, have hampered the MGB-IPH model ability to simulate reliable streamflows. On the other hand, "satisfactory" to "very good" results were obtained from MERGE-based streamflow simulations when applied to downstream watersheds, for both calibration validation periods. Finally, close calibrated parameters were found for the individually calibrated MERGE and rain-gauge-based simulations, indicating a certain degree of hydrological representativeness for SRFE derived simulations.
This study is the first to perform a through validation of MERGE towards hydrological applications. Several limitations were identified for this product when applied to hydrological modeling in anthropogenic watersheds. However, as with other rain gauge adjusted SRFE described in the literature, MERGE has proven to be superior than "pure" TMPA 3B42V7 data and therefore is preferable over the UPSRB.
Author Contributions: F.C. and C.D.R designed the study; J.B.C.d.R. and F.C. performed the precipitation data preparation and analysis; F.C. and B.C.d.S. conducted the hydrological modeling; the manuscript was written by F.C. with significant contributions from all coauthors; the manuscript was revised by C.D.R., B.C.d.S., and J.B.C.d.R. All authors have read and agreed to the published version of the manuscript.