Evaluation of Five Satellite-Based Precipitation Products in Two Gauge-Scarce Basins on the Tibetan Plateau

: The sparse rain gauge networks over the Tibetan Plateau (TP) cause challenges for hydrological studies and applications. Satellite-based precipitation datasets have the potential to overcome the issues of data scarcity caused by sparse rain gauges. However, large uncertainties usually exist in these precipitation datasets, particularly in complex orographic areas, such as the TP. The accuracy of these precipitation products needs to be evaluated before being practically applied. In this study, ﬁve (quasi-)global satellite precipitation products were evaluated in two gauge-sparse river basins on the TP during the period 1998–2012; the evaluated products are CHIRPS, CMORPH, PERSIANN-CDR, TMPA 3B42, and MSWEP. The ﬁve precipitation products were ﬁrst intercompared with each other to identify their consistency in depicting the spatial–temporal distribution of precipitation. Then, the accuracy of these products was validated against precipitation observations from 21 rain gauges using a point-to-pixel method. We also investigated the streamﬂow simulation capacity of these products via a distributed hydrological model. The results indicated that these precipitation products have similar spatial patterns but signiﬁcantly different precipitation estimates. A point-to-pixel validation indicated that all products cannot efﬁciently reproduce the daily precipitation observations, with the median Kling–Gupta efﬁciency ( KGE ) in the range of 0.10–0.26. Among the ﬁve products, MSWEP has the best consistency with the gauge observations (with a median KGE = 0.26), which is thus recommended as the preferred choice for applications among the ﬁve satellite precipitation products. However, as model forcing data, all the precipitation products showed a comparable capacity of streamﬂow simulations and were all able to accurately reproduce the observed streamﬂow records. The values of the KGE obtained from these precipitation products exceed 0.83 in the upper Yangtze River (UYA) basin and 0.84 in the upper Yellow River (UYE) basin. Thus, evaluation of precipitation products only focusing on the accuracy of streamﬂow simulations is less meaningful, which will mask the differences between these products. A further attribution analysis indicated that the inﬂuences of the different precipitation inputs on the streamﬂow simulations were largely offset by the parameter calibration, leading to signiﬁcantly different evaporation and water storage estimates. Therefore, an efﬁcient hydrological evaluation for precipitation products should focus on both streamﬂow simulations and the simulations of other hydrological variables, such as evaporation and soil moisture.


Introduction
Precipitation is one of the most important water balance components of the global water cycle and its spatial-temporal variability directly affects the available water resources in a region [1,2].
properties, climatic conditions, elevation, and topography [40]. For example, Mei, et al. [41] evaluated four satellite precipitation products in mountainous basins and found that the two TRMM products showed better consistencies with gauge observations than the CMORPH and PERSIANN products. Beck, et al. [39] performed a global-scale evaluation of 22 precipitation products using gauge observations and streamflow simulations; the results indicated that the MSWEP product outperformed the others in depicting precipitation temporal variations and in simulating streamflow observations. In addition, previous research on hydrological evaluations of satellite-based precipitation generally focus only on the accuracy of runoff simulations, with less scrutiny on the simulations of hydrological state variables and fluxes, e.g., evaporation and soil moisture.
Compared to the gauge-dense regions, it is more significant to evaluate the precipitation products on gauge-sparse regions, e.g., the Tibetan Plateau (TP). The TP, known as the "third pole", has an average elevation of over 4000 m above sea level (a.s.l) [42]. The TP is also the source of many Asian rivers, such as the Yellow River, the Yangtze River, and the Mekong River, supporting hundreds of millions of people living downstream [43]. Owing to the sparse population, the existing rain gauge networks over the TP are extremely sparse (see Figure 1 in Section 2.1), which is challenging for hydrological research and practices. The satellite-based precipitation products provide a potential way to solve the gauge-scarcity issue on the TP due to their global or quasi-global coverage. However, these precipitation products generally suffer from large uncertainties on the TP due to indirect measurements, insufficient gauge adjustment, and complex terrain [30,36,44]. For example, in many parts of the TP, solid precipitation accounts for a large proportion of the total annual precipitation. However, IR-based methods generally fail to capture the shallow snow over snow-covered surfaces, whereas MW-based methods also face challenges detecting solid precipitation since solid precipitation limits possible MW retrievals to use the scattering signal at higher frequencies [45]. Although some evaluations of satellite precipitation products have been conducted over the TP (e.g., Gao and Liu [46]; Tong, et al. [30]; Wang, et al. [47]; and Liu, et al. [10]), many studies have focused only on a single precipitation product (e.g., [10] and [48]), and few studies have combined the two methods (mentioned above) to evaluate the precipitation products. In addition, some promising recently released precipitation products, such as CHRIPS version 2.0, have not been evaluated yet on the TP. In this paper, a comprehensive evaluation of precipitation products was undertaken in two sparsely gauged river basins on the TP. The purposes of this study are to determine how well each of the products represents precipitation and which products are suitable for hydrological research and applications on the TP. To fulfill this goal, we evaluated five popular satellite precipitation products (see Section 2.2) with gauge-based observations and a physical-based distributed hydrological model (Section 3.1). These precipitation products were first intercompared with each other (Section 4.1) and then validated in 21 grid boxes with the gauge observations over the period 1998-2012 (Section 4.2). After that, we compared the simulated hydrological fluxes and states driven by the five precipitation products and gauge-based observations based on a distributed hydrological model (Section 4.3). We also discussed the uncertainties in the evaluation results and the advantages and challenges for hydrological applications of the satellite precipitation products (Section 5). This study can help extend the applications of remote sensing technologies in hydrological practices in sparsely gauged regions.

Study Area
This study focuses on two gauge-scarce river basins on the TP: the source regions of the Yellow River (UYE) and Yangtze River (UYA) basins ( Figure 1). The two basins are located on the northern TP and have large spatial heterogeneities in the terrain and climate within each basin. The UYE and UYA cover areas of 121,972 km 2 and 137,704 km 2 , respectively. The total area of the two basins accounts for~10.4% of the TP. The elevations of the two basins range from 2776 to 6313 m a.s.l. The climate conditions of the two basins are mainly dominated by the Asian monsoon [49]. The annual mean precipitation decreases as the latitude increases, and more than 70% of the precipitation occurs during the wet season (from May to September). The two basins have long-term daily streamflow records, allowing us to evaluate the capacity of hydrological modeling for different satellite precipitation products. The distribution of the meteorological stations in the two basins is very sparse and uneven. There are only four and nine stations located in the UYA and the UYE basins, respectively ( Figure 1). Thus, it is difficult to characterize the realistic spatial-temporal variability in precipitation by using the interpolated precipitation data from gauge observations. The daily streamflow records were obtained from the Hydrological Bureau of the Ministry of Water Resources of China. The daily meteorological data come from 21 national meteorological stations, which have been subject to strict quality control by the China Meteorological Administration (CMA) (http://data.cma.cn/).

Satellite Precipitation Products
In this study, five satellite precipitation products were employed, which are version 2.0 of the Climate Hazards group Infrared Precipitation with Stations (CHIRPS v2.0), CMORPH v1.0, PERSIANN-CDR, TMPA 3B42 v7, and version 2.0 of the Multi-Source Weighted-Ensemble Precipitation (MSWEP v2.0) (see Table 1). All the products incorporate satellite precipitation information with (a) gauge-based dataset(s) and have the same spatial resolution (i.e., 0.25 • × 0.25 • ), of which the CHIRPS v2.0 and MSWEP v2.0 products also include the reanalysis data during the process of product generation [39,50]. It is worth noting that the selected five precipitation products are not completely independent: different satellite products may employ the same sensor as a data source and one product may be merged into another product. More information on the data sources of different products are given in Sun et al. (2018). The intercomparison and evaluation of these precipitation products were performed over the period 1998-2012. CHIRPS v2.0 is a quasi-global (−50 • to 50 • ) precipitation product that combines a pentadal precipitation climatology, geostationary thermal infrared (TIR) observations, atmospheric reanalysis, and rainfall fields and precipitation measurements from more than 20,000 gauges globally [50,51]. This product has been evaluated at a regional (e.g., Poméon, et al. [38]) and a global (e.g., Beck, et al. [39]) scale and showed reliable performance. CMORPH combines the advantages of the passive MW sensors and IR sensors and utilizes the IR-derived motion vectors to propagate the spatial-temporal resolution of the passive MW-derived precipitation estimates [23]. It has been shown to be superior to other satellite precipitation products in many regions of the world, including China [52,53]. PERSIANN-CDR uses the archive of Gridded Satellite (GridSat-B1) infrared radiation data as the input to the PERSIANN model. The model outputs are then corrected by the monthly Global Precipitation Climatology Project (GPCP) precipitation product to reduce the biases in the estimated precipitation [35,54]. PERSIANN-CDR was reported to be consistent with the ground-based precipitation product in China [35]. TRMM aims to provide the "best" estimate of quasi-global precipitation, which is equipped with multiple rainfall sensors, including a precipitation radar, a TRMM microwave imager, and a visible and infrared radiometer [22,27]. It was developed originally for rainfall retrievals in the tropics and has been extended to a quasi-global scale [38]. TRMM provides two forms of precipitation products: a near real-time product with a delay of several hours from the observation time and a gauge-adjusted product with a delay of 2-3 months from the observation time. In general, the near real-time product is more useful than the gauge-adjusted product in hydrological and meteorological operations, while the gauge-adjusted product is more accurate than the near real-time product in precipitation estimates and is mainly used for scientific research. TMPA 3B42 v7 is one of the precipitation products obtained from TRMM merged with other satellite estimates, which has wide applications at middle and low latitudes [37,55,56]. MSWEP is a newly developed global merging precipitation product that takes advantage of the strengths of gauge-, satellite-, and reanalysis-based data. This product merges five satellite-based, three reanalysis-based, and two gauge-based precipitation products during the data generation process [57]. A notable feature of the MSWEP is that it considers the gauge under-catch and orographic effects on precipitation estimates using a Budyko-based framework and global-coverage runoff observations. MSWEP has been evaluated on a global scale with 21 other precipitation products and exhibited the best performance overall [39]. For comparison purposes, a China-wide gauge-based precipitation dataset (namely, IGSNRR) was also used, aiming to test whether the satellite precipitation products outperform gauge-based precipitation in hydrological modeling of gauge-scarce regions. The IGSNRR dataset was interpolated using~800 national meteorological stations (including the 21 stations showed in Figure 1) from the CMA [58].

Hydrological Model
Model choices may influence the results of a hydrological evaluation. Compared with a lumped model, a distributed hydrological model can reflect the influences of spatial variability in precipitation on hydrological simulations and thus is more sensitive to the errors in precipitation inputs than a lumped model [59]. Here, the hydrological evaluation was performed via a grid-based distributed hydrological model, namely, the Hydro-Informatic Modeling System (HIMS) [10,60,61]. The HIMS model is a conceptual, process-based hydrological model that includes key hydrological processes in both the vertical and horizontal directions, including snow accumulation melt, evaporation from soil and plants, infiltration, water exchange between soil layers, and groundwater recharge and baseflow ( Figure 2). The model runs on a daily scale and has a variable spatial resolution. The current version of the HIMS incorporates a temperature-based snow-accounting model, namely, CemaNeige [62], to calculate the snow accumulation and melt. The CemaNeige model has been compared with six other snow-accounting models on a large set of catchments and exhibited the best performance [62]. The HIMS model uses a physical-based Peman-Monteith (PM) equation to calculate the evaporation from soil and plants. Multiple forms of the PM equation have been developed, and the difference between them lies mainly in how to parameterize the surface conductance (i.e., the inverse of the surface resistance). In the HIMS model, the surface conductance formula developed by Leuning, et al. [63] was used, which has been reported to be superior to the empirical evaporation equation for hydrological modeling [64,65]. The model includes three runoff components: surface runoff, interflow, and baseflow. The surface runoff is calculated using a power-function infiltration equation. This equation contains both infiltration excess and saturation excess runoff mechanisms and has wide applicability [66]. The interflow and baseflow are computed via the linear reservoir method. The model has two interconnected storage units that contribute to the vertical water transfer balance: an unsaturated layer and a saturated layer. The interactions between the unsaturated and saturated layers are represented by a moving boundary in response to the groundwater storage dynamics. The horizontal water transfer is calculated based on a unit-based routing equation. The HIMS model includes 12 free parameters, which are calibrated using the observed streamflow data based on a Monte-Carlo-based calibration method [67]. The model was calibrated for each precipitation product using 10,000 parameter sets from the Monte Carlo random sampling. The observed streamflow data from 1998 to 2005 were used for the calibration, and the data from 2006 to 2012 were used for validation. The forcing data of the HIMS model mainly include the conventional meteorological variables (precipitation, temperature, relative humidity, wind speed, and sunshine duration) and land surface information (digital elevation model (DEM), land use data, leaf area index (LAI)).

Evaluation Method and Criteria
The precipitation in mountainous regions usually presents strong spatial variability [2,31]. It is difficult for an interpolated precipitation dataset from a sparse gauge network to represent the realistic spatial pattern of precipitation at the basin scale. Thus, the validations of the precipitation products were not performed at the basin scale due to a lack of reliable references. In this study, the evaluations of the precipitation products were conducted in two steps. First, a point-to-pixel validation was conducted by comparing precipitation estimates and gauge observations on a daily scale. Then, the gauge-based dataset and the five precipitation products were separately used as the precipitation forcing of the hydrological model to further assess their capacity in hydrological modeling. Four statistical metrics were selected, namely, the Pearson correlation coefficient (CC, −1 ≤ CC ≤1), percent bias (PBIAS, -∞< PBIAS < +∞), root-mean-square error (RMSE, 0 ≤ RMSE < ∞), and Kling-Gupta efficiency (KGE, −∞ < KGE ≤ 1) [68]. The CC is a measure of the strength and direction of the linear relationship between simulations and observations. The PBIAS and RMSE were used to demonstrate the error and bias between the precipitation (streamflow) estimates and gauge observations. The KGE measures the overall goodness of fit between the observations and simulations. The expressions of these performance statistics are given as: Remote Sens. 2018, 10, 1316 where y sim and y obs are the simulations and observations, respectively, and N is the sample size. µ s and σ s are the mean and standard deviation of the simulations, respectively; µ o and σ o are the mean and standard deviation of the observations, respectively; and r is the correlation coefficient between the observations and simulations. The optimal values for the four statistical metrics are CC = 1, RMSE = 0, PBIAS = 0%, and KGE = 1. Note that a negative value of PBIAS indicates the observations being overestimated by the simulations, and vice versa [69].
model, namely, CemaNeige [62], to calculate the snow accumulation and melt. The CemaNeige model has been compared with six other snow-accounting models on a large set of catchments and exhibited the best performance [62]. The HIMS model uses a physical-based Peman-Monteith (PM) equation to calculate the evaporation from soil and plants. Multiple forms of the PM equation have been developed, and the difference between them lies mainly in how to parameterize the surface conductance (i.e., the inverse of the surface resistance). In the HIMS model, the surface conductance formula developed by Leuning, et al. [63] was used, which has been reported to be superior to the empirical evaporation equation for hydrological modeling [64,65]. The model includes three runoff components: surface runoff, interflow, and baseflow. The surface runoff is calculated using a powerfunction infiltration equation. This equation contains both infiltration excess and saturation excess runoff mechanisms and has wide applicability [66]. The interflow and baseflow are computed via the linear reservoir method. The model has two interconnected storage units that contribute to the vertical water transfer balance: an unsaturated layer and a saturated layer. The interactions between the unsaturated and saturated layers are represented by a moving boundary in response to the groundwater storage dynamics. The horizontal water transfer is calculated based on a unit-based routing equation. The HIMS model includes 12 free parameters, which are calibrated using the observed streamflow data based on a Monte-Carlo-based calibration method [67]. The model was calibrated for each precipitation product using 10,000 parameter sets from the Monte Carlo random sampling. The observed streamflow data from 1998 to 2005 were used for the calibration, and the data from 2006 to 2012 were used for validation. The forcing data of the HIMS model mainly include the conventional meteorological variables (precipitation, temperature, relative humidity, wind speed, and sunshine duration) and land surface information (digital elevation model (DEM), land use data, leaf area index (LAI)).
Moving boundary between sub-root soil layer and groundwater layer

Inter-Comparison of Precipitation Products
The inter-comparison of satellite precipitation products with each other can help to identify the consistency and discrepancy in precipitation estimates among the satellite products. Figure 3 shows the spatial-temporal distributions of the five precipitation products. Overall, the spatial patterns of the mean annual precipitation from these products are similar to each other, showing a decreasing trend with increasing latitude and an increasing trend with increasing longitude (Figure 3). However, the mean annual precipitation from the five precipitation products demonstrates large differences over the study area. In the UYA basin, the mean annual precipitation from the TMPA 3B42, CMORPH, PERSIANN-CDR, MSWEP, and CHIRPS is 311, 268, 306, 327, and 307 mm/yr, respectively; the values for the UYE basin are 396, 392, 438, 406, and 399 mm/yr, respectively. In addition, the time series of the basin-average annual precipitation derived from these products are also significantly different from each other (Figure 4). At a given year, in the UYA basin, the gauge-based precipitation generally has the largest values, followed by MSWEP, and CMORPH has the lowest values, while in the UYE basin, the PERSIANN precipitation generally has the largest values, and TMPA 3B42 has the lowest values in most years. In many years, the maximum difference in annual precipitation between two products is larger than 100 mm/yr. The seasonal variations in the precipitation from the five products are overall consistent with each other, except for the CMORPH product in the wet season ( Figure 5). The CMORPH values are lower than the values of the other products in July and August in the UYA basin, while they are higher than the values of the other products in June in the UYE basin. the mean annual precipitation from these products are similar to each other, showing a decreasing trend with increasing latitude and an increasing trend with increasing longitude (Figure 3). However, the mean annual precipitation from the five precipitation products demonstrates large differences over the study area. In the UYA basin, the mean annual precipitation from the TMPA 3B42, CMORPH, PERSIANN-CDR, MSWEP, and CHIRPS is 311, 268, 306, 327, and 307 mm/yr, respectively; the values for the UYE basin are 396, 392, 438, 406, and 399 mm/yr, respectively. In addition, the time series of the basin-average annual precipitation derived from these products are also significantly different from each other (Figure 4). At a given year, in the UYA basin, the gaugebased precipitation generally has the largest values, followed by MSWEP, and CMORPH has the lowest values, while in the UYE basin, the PERSIANN precipitation generally has the largest values, and TMPA 3B42 has the lowest values in most years. In many years, the maximum difference in annual precipitation between two products is larger than 100 mm/yr. The seasonal variations in the precipitation from the five products are overall consistent with each other, except for the CMORPH product in the wet season ( Figure 5). The CMORPH values are lower than the values of the other products in July and August in the UYA basin, while they are higher than the values of the other products in June in the UYE basin.

Point-to-Pixel Validation
A point-to-pixel validation was performed for the five precipitation products using the daily precipitation observations from 21 meteorological stations (see Figure 1). The results are shown in Figure 6 with boxplots and Table 2. The precipitation products and gauge observations show poor linear correlations: the CC values for all products are below 0.50 at the 21 station sites. Overall, MSWEP performs better than the other products, with a median CC = 0.32, while the median CC values for the other products are similar to each other, ranging from 0.17 to 0.18. In terms of the PBIAS, CHIRPS performs the best, followed by TMPA 3B42 and MSWEP. PERSIANN-CDR tends to overestimate precipitation compared to the gauge observations, with a median PBIAS = −11.3%, whereas the CMORPH precipitation shows systematic underestimations, with a median PBIAS = 4.5%. These products show the same performance rankings in RMSE and KGE metrics: MSWEP has the highest rank, followed by CMORPH, TMPA 3B42, CHIRPS, and PERSIANN-CDR. Overall, all products cannot efficiently reproduce the temporal variability in the gauge precipitation at the daily

Point-to-Pixel Validation
A point-to-pixel validation was performed for the five precipitation products using the daily precipitation observations from 21 meteorological stations (see Figure 1). The results are shown in Figure 6 with boxplots and Table 2. The precipitation products and gauge observations show poor linear correlations: the CC values for all products are below 0.50 at the 21 station sites. Overall, MSWEP performs better than the other products, with a median CC = 0.32, while the median CC values for the other products are similar to each other, ranging from 0.17 to 0.18. In terms of the PBIAS, CHIRPS performs the best, followed by TMPA 3B42 and MSWEP. PERSIANN-CDR tends to overestimate precipitation compared to the gauge observations, with a median PBIAS = −11.3%, whereas the CMORPH precipitation shows systematic underestimations, with a median PBIAS = 4.5%. These products show the same performance rankings in RMSE and KGE metrics: MSWEP has the highest rank, followed by CMORPH, TMPA 3B42, CHIRPS, and PERSIANN-CDR. Overall, all products cannot efficiently reproduce the temporal variability in the gauge precipitation at the daily

Point-to-Pixel Validation
A point-to-pixel validation was performed for the five precipitation products using the daily precipitation observations from 21 meteorological stations (see Figure 1). The results are shown in Figure 6 with boxplots and Table 2. The precipitation products and gauge observations show poor linear correlations: the CC values for all products are below 0.50 at the 21 station sites. Overall, MSWEP performs better than the other products, with a median CC = 0.32, while the median CC values for the other products are similar to each other, ranging from 0.17 to 0.18. In terms of the PBIAS, CHIRPS performs the best, followed by TMPA 3B42 and MSWEP. PERSIANN-CDR tends to overestimate precipitation compared to the gauge observations, with a median PBIAS = −11.3%, whereas the CMORPH precipitation shows systematic underestimations, with a median PBIAS = 4.5%. These products show the same performance rankings in RMSE and KGE metrics: MSWEP has the highest rank, followed by CMORPH, TMPA 3B42, CHIRPS, and PERSIANN-CDR. Overall, all products cannot efficiently reproduce the temporal variability in the gauge precipitation at the daily scale. Among the five products, MSWEP has the best consistency with the gauge observations, achieving the best performance in three out of the four metrics. statistic among the five precipitation products.

Hydrological Evaluation
To evaluate the capacity of the streamflow simulations of the precipitation products, the HIMS model was separately calibrated using gauge-interpolated precipitation and satellite precipitation products by maximizing the KGE between the observed and simulated streamflows. Figures 7 and 8 demonstrate the simulated and observed daily streamflows in the two basins during the calibration period (1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005) and validation period (2006)(2007)(2008)(2009)(2010)(2011)(2012). Given that the model performance during the validation period better reflects the actual predictive ability of the hydrological model, hereafter, we only present the model performance during the validation period. Overall, all products used as the

Hydrological Evaluation
To evaluate the capacity of the streamflow simulations of the precipitation products, the HIMS model was separately calibrated using gauge-interpolated precipitation and satellite precipitation products by maximizing the KGE between the observed and simulated streamflows. Figures 7 and 8 demonstrate the simulated and observed daily streamflows in the two basins during the calibration period (1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005) and validation period (2006)(2007)(2008)(2009)(2010)(2011)(2012). Given that the model performance during the validation period better reflects the actual predictive ability of the hydrological model, hereafter, we only present the model performance during the validation period. Overall, all products used as the precipitation forcing of the HIMS model can well-reproduce the streamflow observations in the two basins. The KGE values obtained by the precipitation products are larger than 0.83 in the UYA basin and 0.84 in the UYE basin. Although significant differences in precipitation estimates exist in these products, the model performance statistics driven by the precipitation products are similar to each other. For example, in the UYA basin, the CC values obtained from TMPA 3B42, CMORPH, PERSIANN-CDR, MSWEP, and CHIRPS are 0.94, 0.90, 0.92, 0.95, and 0.92, respectively; the KGE values from these products are 0.89, 0.83, 0.85, 0.88, and 0.85, respectively. In general, no single product consistently produces the best performance among the five products. Comparatively, TMPA 3B42 and MSWEP perform better than the other products in the UYA basin. Both of them achieve the best performance in two out of the four statistical metrics. MSWEP outperforms the other products in the UYE basin, which produces the best performance in three out of the four statistical metrics. In general, no single product consistently produces the best performance among the five products. Comparatively, TMPA 3B42 and MSWEP perform better than the other products in the UYA basin. Both of them achieve the best performance in two out of the four statistical metrics. MSWEP outperforms the other products in the UYE basin, which produces the best performance in three out of the four statistical metrics.   In the UYA basin, the streamflow prediction capacity using the precipitation products was similar to or even better than that using the gauge-based precipitation dataset (i.e., IGSNRR). The KGE values obtained from the five products except for the CMORPH are larger than those obtained from IGSNRR. However, in the UYE basin, gauge-based precipitation performs better in streamflow simulations than the precipitation products, which obtains the best performance in all statistical metrics. A possible reason for the inconsistent results between the two basins is that the number of stations within the UYE basin is larger than that within the UYA basin (nine versus four); another possible reason is that the topography in the UYA basin is more complex than that in the UYE basin In the UYA basin, the streamflow prediction capacity using the precipitation products was similar to or even better than that using the gauge-based precipitation dataset (i.e., IGSNRR). The KGE values obtained from the five products except for the CMORPH are larger than those obtained from IGSNRR. However, in the UYE basin, gauge-based precipitation performs better in streamflow simulations than the precipitation products, which obtains the best performance in all statistical metrics. A possible reason for the inconsistent results between the two basins is that the number of stations within the UYE basin is larger than that within the UYA basin (nine versus four); another possible reason is that the topography in the UYA basin is more complex than that in the UYE basin (Figure 1). The two reasons may result in the interpolated precipitation data in the UYE basin being more reliable than those in the UYA basin.

Validation of the Satellite Precipitation Products
In this study, a point-to-pixel validation was performed by comparing the precipitation products and gauge observations directly at 21 grid boxes. The results showed that no single product agrees well with the daily gauge observations, indicating that the precipitation products are subject to larger uncertainties in daily precipitation estimates. This result is consistent with the results of previous studies by Gao and Liu [46], Tong, et al. [30], Wang, et al. [47], and Miao, et al. [35]. Several studies also found that the satellite precipitation products have larger uncertainties on the TP than in other places [36,39,53]. This is partly because few stations on the TP were involved in the calibration process of satellite precipitation products [27,36]. In addition, both IR-based and MW-based methods are tricky over high elevations and cold surfaces and suffer from the error associated with virga, i.e., precipitation evaporating before reaching the ground, which is common over arid mountainous areas. These limitations cause precipitation retrievals on the TP to be generally less accurate in capturing orographic precipitation and solid precipitation [27,36]. This may also be a reason for the poor agreement between precipitation products and gauge observations on the TP. Among these products, MSWEP generally provides the best validation results. This is probably because the MSWEP directly incorporates global-scale daily gauge data and accounts for gauge under-catch and orographic influences using larger numbers of runoff observations during the process of data generation [39]. The results also indicated that PERSIANN-CDR tends to overestimate precipitation compared with the gauge observations in the UYE basin, while systematic underestimations were found in CMORPH in the UYA basin. Similar results were also achieved by Tong, et al. [30], Gao and Liu [46], and Miao, et al. [35] for PERSIANN-CDR and CMORPH.
Note that although the validation method of the point-to-pixels has commonly been used for precipitation products [9,33,39,42,70,71], it has an inevitable deficiency due to the scale mismatch and wind-induced precipitation under-catch between the precipitation products and gauge observations. It is difficult for a single station to represent grid-wide (0.25 • × 0.25 • ,~625 km 2 ) average precipitation, particularly in gauge-sparse regions, since the precipitation over complex mountainous terrain is highly variable in space. Moreover, the conventional gauge observations are also subject to the under-catch of solid precipitation under windy conditions. Under-catch errors in precipitation gauge observations can be as large as 20-90%, as reported by Yang, et al. [72], in Greenland. These deficiencies in the validation method might result in some degree of uncertainty in the validation results.

Hydrological Evaluations of the Satellite Precipitation Products
For hydrology models, errors in precipitation inputs can lead to large uncertainties in streamflow simulations and predictions [73][74][75]. The hydrological evaluations of the precipitation products are just based on the hypothesis that errors in the precipitation inputs can propagate into the hydrological simulations [26,59]. Compared with the point-to-pixel validation, evaluating precipitation products based on their predictability of streamflow is attractive, as it is performed at the basin scale and avoids the scale discrepancy problem using gauge observations for validation [38]. In this study, we compared the model performances for streamflow simulations driven by a gauge-based dataset and five precipitation products. Although there are considerable differences among these products (see Sections 4.1 and 4.2), all products are able to accurately reproduce the observed streamflow in the study basins. A possible explanation is that the large differences in the precipitation inputs were buffered inside the hydrological model via parameter calibration. The parameter calibration based on only streamflow simulations leads to model parameters that are "falsely adjusted" to maximize the streamflow simulation performances [76], thereby compensating for the efficiency loss in the runoff simulations caused by different precipitation inputs. It should be noted that this compensation may be only valid within a certain threshold of precipitation error. Beyond this threshold, model calibration may be limitedly resistant to error propagation of precipitation products to runoff simulations. Moreover, larger basins tend to be more tolerant of the effect of precipitation error on runoff simulations than the smaller basins [16]. Nevertheless, according to the principle of water balance, the large differences in the precipitation inputs should be reflected by the simulated water balance components, except for runoff. To confirm this deduction, we compared the simulated evapotranspiration (ET), runoff (R), and water storage change (Delta_S) driven by the different precipitation (P) inputs at an annual scale (Figures 9 and 10). As shown in Figures 9 and 10, the changes in R, ET, and Delta_S are dominated by the changes in P; a larger P generally produces a larger ET and Delta_S. For example, the mean annual ET of IGSNRR is approximately 1.5-fold higher than that of CMORPH in the UYA basin; the maximum difference in the annual ET between PERSIANN-CDR and TMPA 3B42 reaches 83.6 mm in the UYE basin. The Delta_S values demonstrate smaller differences than the ET values between these products, and the magnitude of change varies from −30 to 30 mm in the two basins. The spatial patterns of ET and relative soil water storage (SWS/SWSC, see the caption of Figure 11) are also similar to those of precipitation (Figures 11 and 12). The above results confirm that the parameter calibration greatly offsets the influences of different precipitation inputs on streamflow simulations by adjusting the ET and Delta_S. Thus, hydrological evaluations of the precipitation products regarding only the streamflow simulations are less meaningful. In contrast, an efficient hydrological evaluation for precipitation products should focus on both streamflow simulations and the simulations of other hydrological variables, such as evaporation and soil moisture. Similarly, the calibration of hydrological models should also include additional constrains in addition to runoff to improve the physical realism of hydrological models [76].
Remote Sens. 2018, 10, x FOR PEER REVIEW 14 of 20 noted that this compensation may be only valid within a certain threshold of precipitation error.
Beyond this threshold, model calibration may be limitedly resistant to error propagation of precipitation products to runoff simulations. Moreover, larger basins tend to be more tolerant of the effect of precipitation error on runoff simulations than the smaller basins [16]. Nevertheless, according to the principle of water balance, the large differences in the precipitation inputs should be reflected by the simulated water balance components, except for runoff. To confirm this deduction, we compared the simulated evapotranspiration (ET), runoff (R), and water storage change (Delta_S) driven by the different precipitation (P) inputs at an annual scale (Figures 9 and 10). As shown in Figures 9 and 10, the changes in R, ET, and Delta_S are dominated by the changes in P; a larger P generally produces a larger ET and Delta_S. For example, the mean annual ET of IGSNRR is approximately 1.5-fold higher than that of CMORPH in the UYA basin; the maximum difference in the annual ET between PERSIANN-CDR and TMPA 3B42 reaches 83.6 mm in the UYE basin. The Delta_S values demonstrate smaller differences than the ET values between these products, and the magnitude of change varies from −30 to 30 mm in the two basins. The spatial patterns of ET and relative soil water storage (SWS/SWSC, see the caption of Figure 11) are also similar to those of precipitation ( Figures 11 and 12). The above results confirm that the parameter calibration greatly offsets the influences of different precipitation inputs on streamflow simulations by adjusting the ET and Delta_S. Thus, hydrological evaluations of the precipitation products regarding only the streamflow simulations are less meaningful. In contrast, an efficient hydrological evaluation for precipitation products should focus on both streamflow simulations and the simulations of other hydrological variables, such as evaporation and soil moisture. Similarly, the calibration of hydrological models should also include additional constrains in addition to runoff to improve the physical realism of hydrological models [76].        Mean annual evaporation (mm/yr) Figure 11. Spatial pattern of the mean annual evaporation (ET) simulated by the HIMS model during the period 1998-2012 using a gauge-based precipitation dataset ((a) IGSNRR) and five satellite precipitation products (b-f) as the forcing. In each panel, the ET_UYA and ET_UYE indicate the mean annual evaporation in the UYA and UYE basins, respectively.

Conclusions
The ground-based precipitation observation networks are extremely sparse over the TP, which is challenging for hydrological studies and applications in this region. In this study, five satellite precipitation products were evaluated through gauge observations and the accuracy of streamflow simulations. The main conclusions are summarized as follows: 1. The used precipitation products showed similar spatial patterns but considerable differences in the precipitation amount estimates, and suffer from large uncertainties in the daily precipitation estimates compared to the rain gauge observations. Among the five products, MSWEP shows the best consistency with the gauge observations. We thus recommend this product as the preferred choice for applications among the five products. 2. All used precipitation products are able to accurately reproduce the observed streamflow hydrographs by parameter calibration of the hydrological model. However, the differences in precipitation inputs inevitably reflect on the simulations of other hydrological variables other than runoff, e.g., evaporation and water storage, leading to significantly different estimates for these variables. 3. Evaluation of precipitation products regarding only the accuracy of streamflow simulations will mask the differences between these products, since the hydrological models have the ability to buffer the influences of different precipitation inputs on streamflow simulations by parameter calibration.
Similarly, the calibration of hydrological models using streamflow data alone is likely insufficient to well-simulate the hydrological variables in addition to runoff. This calibration strategy is like a "double-edged sword", which may cause other hydrological variables other than runoff to be incorrectly simulated while improving the accuracy of runoff simulations. In addition, although the satellite precipitation products have limitations in the accuracy of precipitation estimates, they demonstrated good potential for hydrological studies and applications in our case study on the TP.

Conclusions
The ground-based precipitation observation networks are extremely sparse over the TP, which is challenging for hydrological studies and applications in this region. In this study, five satellite precipitation products were evaluated through gauge observations and the accuracy of streamflow simulations. The main conclusions are summarized as follows: 1.
The used precipitation products showed similar spatial patterns but considerable differences in the precipitation amount estimates, and suffer from large uncertainties in the daily precipitation estimates compared to the rain gauge observations. Among the five products, MSWEP shows the best consistency with the gauge observations. We thus recommend this product as the preferred choice for applications among the five products. 2.
All used precipitation products are able to accurately reproduce the observed streamflow hydrographs by parameter calibration of the hydrological model. However, the differences in precipitation inputs inevitably reflect on the simulations of other hydrological variables other than runoff, e.g., evaporation and water storage, leading to significantly different estimates for these variables.

3.
Evaluation of precipitation products regarding only the accuracy of streamflow simulations will mask the differences between these products, since the hydrological models have the ability to buffer the influences of different precipitation inputs on streamflow simulations by parameter calibration.
Similarly, the calibration of hydrological models using streamflow data alone is likely insufficient to well-simulate the hydrological variables in addition to runoff. This calibration strategy is like a "double-edged sword", which may cause other hydrological variables other than runoff to be incorrectly simulated while improving the accuracy of runoff simulations. In addition, although the satellite precipitation products have limitations in the accuracy of precipitation estimates, they demonstrated good potential for hydrological studies and applications in our case study on the TP. The future development of satellite precipitation products should combine the advantages of the satellite and ground-based observations and reanalysis data.
Author Contributions: P.B. performed the research design, analyzed the data and wrote the draft; X.L. edited the draft and provided constructive suggestions to improve the paper.