Evaluation of Eight Global Precipitation Datasets in Hydrological Modeling

The number of global precipitation datasets (PPs) is on the rise and they are commonly used for hydrological applications. A comprehensive evaluation on their performance in hydrological modeling is required to improve their performance. This study comprehensively evaluates the performance of eight widely used PPs in hydrological modeling by comparing with gauge-observed precipitation for a large number of catchments. These PPs include the Global Precipitation Climatology Centre (GPCC), Climate Hazards Group Infrared Precipitation with Station dataset (CHIRPS) V2.0, Climate Prediction Center Morphing Gauge Blended dataset (CMORPH BLD), Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks Climate Data Record (PERSIANN CDR), Tropical Rainfall Measuring Mission multi-satellite Precipitation Analysis 3B42RT (TMPA 3B42RT), Multi-Source Weighted-Ensemble Precipitation (MSWEP V2.0), European Center for Medium-range Weather Forecast Reanalysis 5 (ERA5) and WATCH Forcing Data methodology applied to ERA-Interim Data (WFDEI). Specifically, the evaluation is conducted over 1382 catchments in China, Europe and North America for the 1998–2015 period at a daily temporal scale. The reliabilities of PPs in hydrological modeling are evaluated with a calibrated hydrological model using rain gauge observations. The effectiveness of PPs-specific calibration and bias correction in hydrological modeling performances are also investigated for all PPs. The results show that: (1) compared with the rain gauge observations, GPCC provides the best performance overall, followed by MSWEP V2.0; (2) among the eight PPs, the ones incorporating daily gauge data (MSWEP V2.0 and CMORPH BLD) provide superior hydrological performance, followed by those incorporating 5-day (CHIRPS V2.0) and monthly (TMPA 3B42RT, WFDEI, and PERSIANN CDR) gauge data. MSWEP V2.0 and CMORPH BLD perform better than GPCC, underscoring the effectiveness of merging multiple satellite and reanalysis datasets; (3) regionally, all PPs exhibit better performances in temperate regions than in arid or topographically complex mountainous regions; and (4) PPs-specific calibration and bias correction both can improve the streamflow simulations for all eight PPs in terms of the Nash and Sutcliffe efficiency and the absolute bias. This study provides insights on the reliabilities of PPs in hydrological modeling and the approaches to improve their performance, which is expected to provide a reference for the applications of global precipitation datasets.


Introduction
Precipitation is closely related to atmospheric circulation and is a critical component of hydrological cycle [1][2][3]. Accurate precipitation records are not only essential for meteorological and climatic analysis but also the keys for successful water resource management [4,5]. However, acquiring reliable and consistent precipitation series is According to previous studies, the general finding is that there exist large uncertainties in PPs and hydrological applications driving with different PPs, highlighting the importance of evaluation and improvement of which for research and operational applications alike. However, most of these studies evaluated only a subset of the available PPs, either focused on satellite [27,34,35] or reanalysis [14,36,37] within limited regions and time periods [21,35,38], leading to a lack of comprehensive results. Although Beck and Vergopolan [26] evaluated nine global precipitation datasets in hydrological modeling over 9053 catchments worldwide and generated comprehensive results to a certain extent, they did not include the bias correction approach which had been widely used to improve hydrological performances. What is more, although there have been many studies applying the PPs-specific calibration or bias correction method when using PPs for hydrological modeling, few studies focus on the comparison between these two approaches.
Therefore, this study focuses on the evaluation of eight widely used PPs (see Table 1 for overview) during the 1998-2015 period at a daily temporal scale. The specific objectives of this study are: (1) to evaluate the eight PPs by comparing with gauge-observed precipitation over 1382 catchments; (2) to evaluate the reliabilities of PPs for hydrological modeling with calibrated hydrological model using rain gauge observations; and (3) to investigate the effectiveness of bias correction and PPs-specific calibration in hydrological modeling performances. The ultimate goal of this study is to put insight on the reliabilities of PPs as well as their performances in hydrological modeling. Table 1 presents all datasets used in this study, including the eight PPs for precipitation evaluation and the other meteorological datasets for hydrological modeling, i.e., gaugeobserved precipitation, temperature and the gridded potential evaporation data, as well as the daily streamflow data used for hydrological model calibration. In this study, all the gridded datasets were interpolated to catchment-averaged values by using the Thiessen Polygon method [39] for hydrological modeling. The interpolation was executed on a daily time step for the 17-year time period (1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015). Considering the data quality and availability of ground observations, the evaluation was carried out over 1382 catchments in China, Europe and North America.  Note: abbreviations in the data source column defined as follows: G, gauge; S, satellite; R, reanalysis.

Global Precipitation Datasets
There are eight PPs evaluated in this study, including one gauge-based dataset: Global Precipitation Climatology Centre (GPCC); five satellite-related datasets: Climate Hazards Group Infrared Precipitation with Station dataset (CHIRPS) V2.0, Climate Prediction Center Morphing (CMORPH) Gauge Blended dataset (CMORPH BLD), Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN) Climate Data Record (PERSIANN CDR), TMPA 3B42RT, and MSWEP V2.0; and two reanalysis datasets: ERA5, and WATCH Forcing Data methodology (WFD) applied to ERA-Interim Data (WFDEI). The evaluation was conducted during the common period of the eight PPs, which was 1998-2015.
GPCC is the largest gauge-based dataset, and was developed to collect, perform quality control on, and analyze rain gauge data across the globe.
Satellite-related datasets use polar-orbiting passive microwave (PMW) sensors on low-Earth-orbiting satellites and geosynchronous infrared (IR) sensors on geostationary satellites to estimate precipitation [4,6], and usually blend rain gauge data to offset their limited abilities [50]. In specific, PERSIANN CDR and TMPA 3B42RT mainly respectively applied IR data and PMW data, while the other three integrated both IR and PMW data. MSWEP V2.0 and CMORPH BLD directly incorporated daily gauge data, CHIRPS V2.0 incorporated 5-day gauge data, and TMPA 3B42RT and PERSIANN CDR incorporated monthly gauge data.
Reanalysis-based datasets are designed to generate various meteorological variables with a consistent spatial and temporal resolution by assimilating observations such as weather stations, satellites, ships, and buoys based on different climate models. ERA5 is the fifth generation of atmospheric reanalysis data to replace ERA-Interim produced by ECMWF, assimilating observations from over 200 satellite instruments or types of conventional data and information on rain rate from ground-based radar-gauge composite observations. WFDEI was generated by applying WFD to the ERA-Interim reanalysis data, and used the monthly GPCC in bias correction.

Other Meteorological Datasets
In the study, we used the gauge-observed precipitation to evaluate the PPs, as well as to calibrate the hydrological model together with the observed temperature. More specifically, we used the gauge-observed precipitation and temperature in China, which came from the China Ground Rainfall/temperature Daily Value 0.5 • × 0.5 • Lattice Dataset (CGRD/CGTD) (http://data.cma.cn, accessed on 16 July 2021). The CGRD/CGTD was generated by interpolating daily observed precipitation/temperature from more than 2000 meteorological stations in China, the reliability of which has been proved by Zhao and Zhu [51]. In addition, the gauge-observed precipitation and temperature used in Europe were the E-OBS [45], which was a European high-resolution gridded dataset derived from the Europe Climate Assessment & Dataset (ECA&D). The ECA&D forms the strong backbone of the E-OBS since it collects 66865 series of observations at 19087 meteorological stations throughout Europe. Furthermore, the observations used in North America were from a combination of Canadian and United States databases. For Canada, hydrometeorological data and boundary data were from the Canadian model parameter experiment (CANOPEX) database [46]. For US, precipitation and temperature were from the Santa Clara database [47] and streamflow and boundary data were from the United States Geological Survey (USGS) database [49]. Those observed precipitation and temperature data were catchment averaged.
The Global Land Evaporation Amsterdam Model (GLEAM) is a set of algorithms estimating different components of land evaporation [52], whose gridded potential evaporation data was used as an input of the hydrological model for hydrological modeling in this study. The reliability of GLEAM has been tested by Martens and Miralles [48].

Observed Streamflow
The observed streamflow and boundary data for Chinese catchments were collected from different streamflow-gauging stations. For Europe, they came from the most complete in-situ discharge dataset freely available to the global scientific community, the Global Runoff Data Center (GRDC) dataset (http://grdc.bafg.de, accessed on 16 July 2021). The GRDC is known as the most accurately measured component of the water cycle since it is dedicated to collecting and archiving river discharge data globally [53]. For North America, the observed streamflow and boundary data were from the CANOPEX database and the USGS database mentioned above.
The following two criteria were used to select suitable catchments for hydrological modeling: (1) the catchment area is >2500 km 2 and <50,000 km 2 . The former is to prevent catchments unrepresentative of the 0.5 • grid cells (2500 km 2 at 0 • latitude) from confounding the results and the latter is to reduce the error of catchment averaged values extracted from gridded datasets. (2) The time series of streamflow has to be ≥5 years (can be intermittent) during 1998-2015. Therefore, 232, 184 and 966 catchments were selected from China, Europe and North America, respectively.

Methods
The workflow of this study is illustrated below, and more details of the hydrological model (Xin'anjiang (XAJ) model) and the methods applied for evaluation of PPs are described in Sections 3.1-3.3.
(1) PPs evaluation was conducted by comparison with the rain gauge observations over the selected catchments. In addition, we applied a bias correction method to PPs and obtained "bias corrected-PPs" (BC-PPs), which were also conducted in the comparison. (2) The hydrological model calibration was firstly performed driven by rain gauge observations, and the calibrated parameter set was referred to as the "Reference Parametersets" (RP). The performance of hydrological model calibration served as a benchmark value for later hydrological modeling driving by the eight PPs. (3) The performances of hydrological modeling with the PPs were evaluated in the following three steps: in step 1, the reliability of PPs for hydrological modeling was investigated by running the model with RP in the calibration period; in step 2, a hydrological model was calibrated by each PPs, which was called PPs-specific calibration, and then their performances were compared with the benchmark value; in step 3, the BC-PPs were used to drive the hydrological model based on the RP in the calibration period and their performances were compared with the benchmark value.

XAJ Model
The XAJ model [54] is a conceptual rainfall-runoff model which has been widely used in China and many other countries for streamflow simulations [55][56][57]. Figure 1 shows the flowchart of this deterministic lumped model. The calculation process of the XAJ consists of four parts: the evaporation module, the runoff yielding module, the runoff sources partition module, and the runoff concentration module. The evaporation is calculated in three soil layers, including an upper layer, a lower layer and a deep layer, based on the watershed saturation-excess runoff theory. The storage curve calculates the total runoff based on the concept of runoff formation on repletion of storage, which means that runoff is not generated until soil moisture reaches the filled capacity. By using a free water capacity distribution curve, the total runoff is divided into three components including surface runoff, interflow and groundwater runoff. The surface runoff is routed by the unit hydrograph, the interflow and groundwater flow are routed by the linear reservoir method. There are 15 parameters within the XAJ model, four accounting for evaporation, two accounting for runoff generation and nine accounting for runoff routing. More details can be found in Zhao [54]. In addition, the CemaNeige module is added to the XAJ model to simulate the snow accumulation and snowmelt processes since the lack of a snow component in XAJ limits its applicability in snow-dominated watersheds. The CemaNeige module separates precipitation into rainfall and snowfall and calculates the snowmelt based on a degree-day method, which has two parameters to be calibrated [58]. Overall, the XAJ model used in this study contains 17 parameters. XAJ requires catchment averaged precipitation, temperature and potential evaporation as inputs. For each catchment, the XAJ model was calibrated by the first 70% of observed streamflow data (using the first year as warm-up) and validated by the last 30%. The calibration was performed using the SCE-UA algorithm [59] to optimize model parametersets based on the objective function of the Nash and Sutcliffe efficiency (NSE) [60].

Bias Correction Method
A distribution-based bias correction method called the Daily Bias Correction (DBC) method [61] was applied to correct the catchment averaged precipitation for each PPs in the study. The DBC is a hybrid method combining the Local Intensity Scaling (LOCI) method [62] to correct the precipitation occurrence and the Daily Translation (DT) method [63] to correct the frequency distribution of precipitation. Here are the two steps of the DBC method used in this study: (1). The LOCI method was used to correct the precipitation occurrence, which ensured that the frequency of the precipitation occurrence estimated by PPs equaled to that of the observed data for a specific month. (2). The DT method was then used to correct the empirical distribution of PPs-estimated precipitation magnitudes in terms of 100 quantiles from 0.01 to 1 with an interval of 0.01.

Performance Evaluation Indices
The evaluation of PPs was conducted using the gauge-observed precipitation over the selected catchments based on three statistical indices as follows: (1) The Pearson linear correlation coefficient (R) is used to assess the agreement between 3-day means of PPs and gauge-observed precipitation as follows: where o j and p j are the 3-day mean PPs and gauge-observed precipitation time series, respectively. o and p are the average of all the 3-day means of PPs and gauge-observed precipitation, respectively. m is the length of 3-day mean time series. The R ranges from −∞ to 1 and a larger R represents a better performance. Note that the R is calculated for 3-day mean rather than daily precipitation estimates, as Beck and Vergopolan [26] did, which is done to reduce the impact of the issue with gauge reporting times (i.e., the start and end times of the daily accumulations). (2) The relative bias ratio (RB) is used to assess the systematic bias of precipitation estimates of PPs and it is also used to assess the systematic bias of the simulated discharge as follows: where O i and P i are the daily values of the ith day for the gauge-observed precipitation and the PPs, respectively. O and P are the average of all the daily values for the gaugeobserved precipitation and the PPs, respectively. n is the number of days. The RB ranges from -∞ to ∞ and the best result is 0. (3) The root mean square error (RMSE) is used to assess the difference between PPs and gauge-observed precipitation as follows: The RMSE ranges from 0 to ∞ and a smaller RMSE represents a better performance.
The hydrological performances of PPs are evaluated by calculating NSE index between the observed and simulated discharge. The NSE is shown as follows: where Q obs i and Q sim i are the daily values of the ith day for the observed and simulated streamflow, respectively. Q obs is the average of all the daily values for the observedstreamflow. The NSE ranges from −∞ to 1 and a larger NSE represents a better performance.

Evaluation of Precipitation Estimates
In this Section, the PPs and BC-PPs are compared with the gauge-observed precipitation over 1382 catchments for the 1998-2015 period. Figure 2 presents the performances in terms of the R calculated for 3-day means (R 3day ), RB and RMSE. It shows that GPCC is superior to other PPs in terms of both R 3day (median: 0.83) and RMSE (median: 4.54), and the absolute RB (median: 2.40) which is only larger than that of MSWEP V2.0 (median: 0.81). The good performance of GPCC is in line with the study of Schneider, Becker [64], and is attributed to it is the largest gauge-based dataset, with data collected from more than 70,000 different stations worldwide [40]. Note that the use of R 3day can only reduce but cannot completely eliminate the impact of reporting time issues. Therefore, it is possible that the good performance of GPCC is also attributed to the similar time shifts between the reference observations and GPCC. The MSWEP V2.0 appears to perform better than the remaining six PPs in terms of the median value of three indices (R 3day of 0.82, absolute RB of 0.81, RMSE of 5.21), underscoring the effectiveness of merging multiple satellite and reanalysis datasets, which is in agreement with the finding of Beck and Vergopolan [26]. The ERA5 performs well in terms of R 3day (median: 0.80) and RMSE (median: 4.92) but attains the worst absolute RB (median: 7.89). The larger relative biases in precipitation estimates from ERA5 are inconsistent with the findings of Jiang and Li [65]. The MSWEP V2.0, GPCC and CHIRPS V2.0 attain better RB scores, which are attributed to the use of gauge-based Climate Hazards Center's Precipitation Climatology (CHPclim) dataset [66] or Global Climate Data (WorldClim) [67] to determine their long-term mean. The median scores of R 3day , absolute RB and RMSE for the eight BC-PPs are 0.67~0.83, 0.07~3.12, and 4.84~5.63, respectively, which shows generally better performance than that from the eight raw-PPs.  Figure 3 presents the spatial patterns of R 3day between eight PPs and rain gauge observations (see the Appendix A for Spatial patterns of RB and RMSE). It shows that GPCC and MSWEP V2.0 exhibit R 3day scores higher than 0.8 over most of these catchments. While PERSIANN CDR exhibits generally poor performances, which is consistent with the finding of previous evaluations [11,68] that IR-based datasets perform worse than PMWbased ones in precipitation estimation. The two reanalysis datasets, ERA5 and WFDEI, exhibit very similar performance. Regionally, PERSIANN CDR, CHIRPS V2.0, and TMPA 3B42RT perform relatively worse in Europe with median R 3day scores of 0.49, 0.58 and 0.59. For China and North America, respectively, the worst performances for both are attained by PERSIANN CDR with median R 3day scores of 0.64 and 0.69. All these PPs show relatively higher R 3day scores over Western Europe, Eastern US and Southeastern China, where the density of observations is relatively high. Conversely, all exhibit worse performances over topographically complex regions such as the Balkan region, Southwestern China and the Andes. In terms of R 3day , RB and RMSE, there generally exist larger discrepancies among the eight PPs over topographically complex mountainous regions, implying difficulties in estimating precipitation in these regions [11].

Benchmark Performance of Streamflow Simulation with Gauge-Observed Precipitation
In this Section, the XAJ model is calibrated by using rain gauge observations over 1382 catchments to test its performance and to provide the benchmark value for subsequent hydrological modeling with eight PPs. Figure 4 presents the spatial patterns and Cumulative Distribution Function (CDF) of NSE for both calibration and validation periods. The CDF of NSE shows that the median NSE scores are 0.79 and 0.66 for the calibration and validation periods, respectively. The spatial patterns of NSE show that the XAJ model driven by rain gauge observations can achieve good performances over most catchments, although there are relatively low NSE scores over the US Great Plains, which might be due to the spatially-temporally highly intermittent rainfall regime combined with a strongly nonlinear rainfall-runoff response, and over the Balkan region, which is presumably due to the low E-OBS rain-gauge density. In general, the results demonstrate satisfactory performance of the XAJ model based on the observations, which can be served as a benchmark for the hydrological performance evaluation of PPs.

Evaluation of Streamflow Simulations with Eight PPs
In this Section, the evaluations of streamflow simulations with eight PPs are based on three steps (see Section 3 for details). Table 2 presents the median NSE scores of eight PPs in reproducing the observed streamflow over 1382 catchments for all three steps. Figure 5 shows the CDF of NSE scores. It should be noted that CDF of steps 1~3 (solid line) is derived for the calibration period, and CDF of 'validation' (dashed line) refers to the performances for the validation period when applying PPs-specific calibration (in step 2). Step 2' and step 3' are the absolute improvements of NSE obtained by step 2 and step 3, respectively, compared with step 1.  Table 2 shows that the overall performance ranking of the PPs in step 1 from best to worst is MSWEP V2.0, CMORPH BLD, GPCC, CHIRPS V2.0, ERA5, WFDEI, TMPA 3B42RT and PERSIANN CDR. This indicates that the datasets incorporating daily gauge data (i.e., MSWEP V2.0, and CMORPH BLD) overall outperform those incorporating 5day (i.e., CHIRPS V2.0) or monthly (i.e., TMPA 3B42RT, WFDEI, and PERSIANN CDR) gauge data. In comparison with GPCC, the superior performances of MSWEP V2.0 and CMORPH BLD also underscore the effectiveness of incorporating multiple satellite and reanalysis datasets.
In step 2, the hydrological modeling performances of the eight PPs are overall improved by PPs-specific calibration, with the highest NSE score for MSWEP V2.0 and the lowest NSE score for PERSIANN CDR, which is consistent with the (highest and lowest) performance ranking from step 1. Nevertheless, the absolute improvement is larger for the PPs with poor performances (i.e., TMPA 3B42RT, and PERSIANN CDR) than those with good performances (i.e., MSWEP V2.0, CMORPH BLD, and GPCC) in step 1. In addition, the bias correction in step 3 also improves the hydrological modeling performance for all PPs, with large improvement for those with large biases (i.e., TMPA 3B42RT, PERSIANN CDR and ERA5), which can be seen in Figure 2b. However, the effect of bias correction is negligible for the PPs with good performances in step 1 (i.e., MSWEP V2.0, CMORPH BLD, and GPCC). It can also be seen that the CDFs in steps 2 and 3 have consistently higher NSE scores than that in step 1, i.e., the mean median values of NSE are 0.50, 0.67, and 0.56 from steps 1,2 and 3, respectively. This demonstrates that the PPs used in this study can result in better performances of hydrological modeling after applying a PPs-specific calibration or bias correction method.
According to studies of Moriasi and Arnold [69] and Knoben and Freer [70], streamflow simulation can be considered to be satisfactory if NSE > 0.5. Based on that, the XAJ driven by gauge-observed precipitation can provide satisfy performances over 90% catchments in the calibration (Figure 4a) and 70% catchments in the validation period ( Figure 4c). As for the hydrological modeling performances of PPs, in step 1, there are approximately 20% (PERSIANN CDR, blue line in Figure 5e)~70% (MSWEP V2.0, blue line in Figure 5d) catchments above the corresponding threshold for NSE. There are more than 70% (PERSIANN CDR, red line in Figure 5e)~90% (GPCC, red line in Figure 5a), and 40% (PERSIANN CDR, green line in Figure 5e)~75% (MSWEP V2.0, green line in Figure 4d) catchments above the corresponding threshold in step 2 and step 3, respectively. Figure 5 also shows the hydrological modeling performances during the validation period when PPs-specific calibration is used. There are about 40% (PERSIANN CDR, dash line in Figure 5e)~70% (MSWEP V2.0, dash line in Figure 5d) catchments above the corresponding threshold. The results above indicate that the PPs have good potential for hydrological modeling, which is consistent with recent findings [71,72]. What is more, the best performance of MSWEP V2.0 among the PPs shows that, to a certain extent, it can be used as an alternative forcing to hydrological modeling of XAJ where a lack of gauge precipitation observations exists. Figure 6 presents the spatial patterns of NSE for the eight PPs obtained by running XAJ with 'RP' (step 1). MSWEP V2.0, GPCC and CMORPH BLD generally exhibit good performances in Eastern US, Southeastern China, Northern and Western Europe, even with MSWEP V2.0 and GPCC outperforming the gauge-observed precipitation in Europe. All the PPs provide low NSE scores over the US Great Plains, especially for GPCC, PERSIANN CDR, TMPA 3B42RT, and WFDEI (<0.2), which is consistent with previous findings using different hydrological models and precipitation datasets [26,71]. Low NSE scores are also found for CHIRPS V2.0, TMPA 3B42RT and ERA5 in China, and for PERSIANN CDR in both China and Europe. There are some PPs performing better than others regionally, but there is no one outperforming everywhere. For instance, MSWEP V2.0 generally shows better performances in most places, but it tends to perform worse than PERSIANN CDR in the northern part of Rocky Mountains. For each PPs, the NSE scores are relatively higher in temperate regions than in arid or topographically complex mountainous regions, due to the sparse rain-gauge networks and the highly non-linear rainfall-runoff response. Figure 7 shows the PPs with the highest improvement in NSE by applying PPs-specific calibration (step 2) and bias correction (step 3) relative to step 1. The spatial pattern of the PPs with the largest improvement in NSE by using PPs-specific calibration (Figure 7a) is similar with that by using bias correction (Figure 7b). There are higher improvements for ERA5, WFDEI, PERSIANNCDR and TMPA 3B42RT over Southeastern US, Northern Europe, Western Europe and most watersheds in China, respectively. This is in accordance with the observation that these datasets show worse initial performances over these regions. In addition, RB is used to further explore whether the PPs can be used to estimate the annual streamflow and the results are shown in Figure 8. The RB derived from gaugeobserved precipitation (Obs) is also displayed, which provides a satisfied estimation for the annual streamflow with a slight underestimation. The MSWEP V2.0 with the highest NSE values (median: 0.63) has the lowest error in annual streamflow (median: −1.63). The good performance of MSWEP V2.0 in terms of both NSE and RB indicate that we can use MSWEP V2.0 as an alternative precipitation data source for hydrologic modeling when facing a lack of gauge precipitation observations. For the eight PPs, the median values of absolute RB in step 2 (mean: 4.57) and step 3 (mean: 6.94) are lower than that in step 1 (mean: 9.32). This demonstrates that specific calibration and bias correction can effectively improve the abilities of the PPs in the annual streamflow estimates.

Conclusions
This study comprehensively evaluated eight widely used PPs including GPCC, CHIRPS V2.0, CMORPH BLD, PERSIANN CDR, TMPA 3B42RT, MSWEP V2.0, ERA5 and WFDEI in hydrological modeling over 1382 catchments in China, Europe and North America during the 1998-2015 period at a daily temporal scale. The PPs-specific calibration and bias correction method has also been included in the hydrological evaluation and discussion. The following conclusions can be drawn: (1) Compared with the gauge-observed precipitation, GPCC provides the best performance overall, followed by MSWEP V2.0, which is merged based on multiple satellite and reanalysis datasets. (2) Among all the PPs, MSWEP V2.0 and CMORPH BLD, which incorporate daily gauge data provide superior hydrological performance, followed by those incorporating 5-day (CHIRPS V2.0) and monthly (TMPA 3B42RT, WFDEI, and PERSIANN CDR) gauge data. MSWEP V2.0 and CMORPH BLD perform better than GPCC, underscoring the effectiveness of merging multiple satellite and reanalysis datasets. (3) Regionally, all PPs exhibit better performances in temperate regions than in arid or topographically complex mountainous regions, due to the sparse rain-gauge networks and the highly non-linear rainfall-runoff response. Uncertainty exists in the regional performances of all the PPs. (4) PPs-specific calibration and bias correction both can improve the streamflow simulations for all eight PPs in terms of the Nash and Sutcliffe efficiency and the absolute bias. The improvements in hydrological modeling performances are larger for the PPs with poor performances.
Overall, this study investigates the reliabilities of PPs in hydrological applications, as well as the approaches to improve their hydrological modeling performances. There are still some limitations. For example, the catchments are located in China, Europe and North America with dense rain-gauge networks and these conclusions may not generalize to regions with sparse rain-gauge networks. In addition, some different results may be derived when using another hydrological model, calibration objective function or temperature or evaporation forcing. These problems therefore should be investigated in future studies to generalize the conclusions in this study. Figure A2. Spatial patterns of RMSE for the eight PPs using gauge-observed precipitation from 1382 catchments as a reference. Each data point represents a catchment centroid.