Evaluation and Hydrological Application of Four Gridded Precipitation Datasets over a Large Southeastern Tibetan Plateau Basin

: Reliable precipitation is crucial for hydrological studies over Tibetan Plateau (TP) basins with sparsely distributed rainfall gauges. In this study, four widely used precipitation products, including the Asian Precipitation Highly Resolved Observational Data Integration Towards Evaluation of the water resources (APHRODITE), the High Asia Reanalysis (HAR), and the satellite-based precipitation estimates from Global Precipitation Measurement (GPM) and Tropical Rainfall Measurement Mission (TRMM), were comprehensively evaluated by combining statistical analysis and hydrological simulation over the Upper Brahmaputra (UB) River Basin of TP during 2001–2013. In respect to the statistical assessment, the overall performances of GPM and HAR are comparable to each other, and both are superior to the other two datasets. For hydrological assessment, both daily and monthly GPM-based streamﬂow simulations perform the best not only at the UB outlet with very good results, but they also illustrate satisfactory results at Yangcun and Lhasa hydrological stations within the UB. Runoff simulation using HAR only performs well at the UB outlet, whereas it shows poor results at both Yangcun and Lhasa stations. The simulated results based on APHRODITE and TRMM show poor performances at UB. Generally, the GPM shows an encouraging potential for hydro-meteorological investigation over UB, although with some bias in ﬂood simulation. acceptable ﬂood-events simulation at Nuxia, Yangcun, and Lhasa gauges could indirectly indicate the reliability of PCP_Sun as the benchmark dataset. Of course, it is undeniable that the precipitation dataset PCP_Sun may contain uncertainty. In future work, more studies will be implemented to thoroughly investigate the uncertainties involved in precipitation evaluation, such as the selection of benchmark precipitation datasets, utilization of diverse hydrological models, and application of different parameter calibration methods.


Introduction
With an area of about 2.5 million km 2 , the Tibetan Plateau (TP) is located in Central Asia and has an average elevation of approximately 4000 m above mean sea level (AMSL) [1]. As the Asia's 'water tower', the TP is the source regions of many large rivers, such as the Brahmaputra, Mekong, Indus, Yangtze, and the Yellow River. Under the global climate change, the TP is also experiencing rapid warming during recent decades, with a mean annual temperature rise of 0.46 • C per decade [2], much higher than the global average. The accelerated warming climate has changed the composition of the cryosphere over the TP, such as the glacier retreat, snow-cover reduction, and permafrost degradation, which will cause the corresponding changes in local hydrology and water resources.
Meanwhile, precipitation is the most important source of water in the TP and has a decisive role in shaping basin hydrological cycle [3,4]; thus, accurate and reliable precipitation information is crucial for hydro-meteorological studies, such as hydrological simulation, water resources management, and climate modeling [5]. Currently, rain-gauge observation is the major source of precipitation data. However, measurements from these in situ observational networks are limited in remote mountainous terrain with complex topography, such as TP and its surrounding areas [6]. Due to the high altitude, harsh environment, and the inaccessibility, precipitation observation networks with long-term series are sparse or nonexistent in many regions of the TP, so they are not sufficient enough to accurately depict the precipitation distribution and also hamper reliable hydrological predictions.
As an alternative to precipitation data from traditional ground gauges, many gridded precipitation datasets at the global or regional scale are now available to the public, such as satellite observation, re-analysis data, gauge-based products, and regional climate model outputs. During the last few decades, a series of satellite precipitation products have been developed, including the Tropical Rainfall Measuring Mission (TMMM 3B42) [7], Global Precipitation Measurement (GPM) [8], Climate Prediction Centre Morphing Algorithm (CMORPH) [9], and Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data Record (PERSIANN-CDR) [10]. In terms of gauge-based precipitation products, several gridded datasets for global or Asia have been constructed and widely utilized, such as Climate Research Unit (CRU) [11], Global Precipitation Climatology Center (GPCC) data [12], and Asian Precipitation-Highly-Resolved Observational Data Integration Towards Evaluation of the water resources (APHRODITE) [13]. It has been believed that the APHRODITE dataset is one of the most realistic gridded precipitation datasets for Asia [14]. With respect to output from atmospheric model, using a regional numerical weather prediction model to dynamically downscale global climate data to high spatial resolution over data-sparse mountainous regions is one method for providing sufficiently detailed precipitation information for driving hydrological and water resources models [15]. For example, the High Asia Reanalysis (HAR) dataset was generated by dynamical downscaling of global analysis data using the Weather Research and Forecasting (WRF) model [16]. The HAR dataset could provide detailed and process-based precipitation fields at 10 km resolution for TP and its surrounding area.
Due to differences in spatiotemporal resolution and the algorithms for developing them, there are inevitably some uncertainties and errors in these gridded precipitation products especially in the remote mountain areas such as TP. Therefore, it is significant and necessary to evaluate these precipitation datasets through validation by using reliable observed data. The evaluation work can be generally classified into two categories: (1) direct comparison of the gridded precipitation to the corresponding rainfall gauge's precipitation data; (2) evaluation of gridded precipitation products based on their predictive ability of streamflow rate in a hydrological modeling framework. Evaluation works involved in different gridded precipitation datasets have been made over a range of scales from basin to global extent [17][18][19][20][21][22][23][24][25]. Meanwhile, recently some efforts have been put into evaluating and validating gridded precipitation over the TP and the basins within it [6,[26][27][28][29][30][31][32].
The Upper Brahmaputra (UB) River Basin is located in the southeastern of TP (Figure 1), which is the highest river in the world with an average altitude over 4000 m AMSL and the fifth longest river in China. Runoff from UB is not only crucial for the water resources management and exploitation in the local region, but also affects the utilization of water resources in downstream basins, such as agricultural production and hydropower development in India. Precipitation is the most important source of runoff in this basin [33]. Some works have been performed to evaluate and validate the suitability of different gridded precipitation products over the UB. For example, Tong et al. [34] found that the TMPA and APHRODITE are 22-25% lower than the corrected-CMA (Chinese Meteorological Administration) in the UB. Moreover, based on a dense rain gauge network in the Southern Tibetan Plateau, primarily including the UB, Xu et al. [35] evaluated GPM IMERGE and TRMM 3B42V7 and indicated that the performance of GPM is superior to the TRMM in this region. Meanwhile, Xuan et al. [36] used the TRMM 3B42V7 as the input forcing to the SWAT model over a tributary river basin in the UB, and the results revealed that TRMM data are very useful in the hydrological simulation over this cold and high-altitude region. In addition, the studies from Ji et al. [14] indicated that the APHRODITE dataset underestimates the precipitation amount in this region. However, comprehensive evaluations of all is superior to the TRMM in this region. Meanwhile, Xuan et al. [36] used the TRMM 3B42V7 as the input forcing to the SWAT model over a tributary river basin in the UB, and the results revealed that TRMM data are very useful in the hydrological simulation over this cold and high-altitude region. In addition, the studies from Ji et al. [14] indicated that the APHRODITE dataset underestimates the precipitation amount in this region. However, comprehensive evaluations of all gridded precipitation datasets by integrating statistical indices and hydrological validation methods are still lacking in the UB. In this study, based on more precise gauge-based reference precipitation data, four widely used gridded precipitation products (HAR, APHRODITE, TRMM, and GPM) were evaluated over the Upper Brahmaputra (UB) River Basin. The novelty of this research is twofold. First, statistics-based direct comparison and hydrological simulationbased indirect validation by using Variable Infiltration Capacity glacier (VIC-glacier) model were combined to assess the utility of four gridded precipitation data. Second, both daily and monthly hydrological modeling based on the four gridded precipitation datasets as input were evaluated not only on the outlet of the UB, but also in two internal subbasins with available measured flow data within the UB.

Study Area
The Brahmaputra River is an important international river which flows through China, Bhutan, India, and Bangladesh. It originates from the Gyima Yangzoin Glacier of the Tibetan Plateau, with a draining area of about 520,000 km 2 ; it has the fifth largest runoff in the world [37]. In this study, the focus area was the upstream of the Brahmaputra River Basin (UB) above the hydrological gauge station Nuxia ( Figure 1) with a drainage area of about 201,200 km 2 . In addition, the Yangcun gauge at the main stream of UB and the Lhasa gauge at its tributary river of the Lhasa basin, an important subbasin of UB, were also selected to validate the hydrological simulation for the four gridded precipitation products ( Figure 1 and Table 1). The UB is located in the southeast of the TP, within 81°E-95°E, 27°N-32°N. This basin has a complex topography, with elevation ranging from 3000 to 6000 m AMSL. Dominated by the southeast monsoon, the wet season begins from May In this study, based on more precise gauge-based reference precipitation data, four widely used gridded precipitation products (HAR, APHRODITE, TRMM, and GPM) were evaluated over the Upper Brahmaputra (UB) River Basin. The novelty of this research is twofold. First, statistics-based direct comparison and hydrological simulation-based indirect validation by using Variable Infiltration Capacity glacier (VIC-glacier) model were combined to assess the utility of four gridded precipitation data. Second, both daily and monthly hydrological modeling based on the four gridded precipitation datasets as input were evaluated not only on the outlet of the UB, but also in two internal subbasins with available measured flow data within the UB.

Study Area
The Brahmaputra River is an important international river which flows through China, Bhutan, India, and Bangladesh. It originates from the Gyima Yangzoin Glacier of the Tibetan Plateau, with a draining area of about 520,000 km 2 ; it has the fifth largest runoff in the world [37]. In this study, the focus area was the upstream of the Brahmaputra River Basin (UB) above the hydrological gauge station Nuxia ( Figure 1) with a drainage area of about 201,200 km 2 . In addition, the Yangcun gauge at the main stream of UB and the Lhasa gauge at its tributary river of the Lhasa basin, an important subbasin of UB, were also selected to validate the hydrological simulation for the four gridded precipitation products ( Figure 1 and Table 1). The UB is located in the southeast of the TP, within 81 • E-95 • E, 27 • N-32 • N. This basin has a complex topography, with elevation ranging from 3000 to 6000 m AMSL. Dominated by the southeast monsoon, the wet season begins from May and lasts to September, while the dry season is from October to the next April, as a result of westerlies prevailing. Meanwhile, annual precipitation exhibits a southeastern to northwestern gradient, ranging from approximately 1200 mm to less than 300 mm in order. Moreover, as located in the TP, glaciers are broadly distributed over the UB, and the total proportion of glaciers over the UB is about 2.11%. Traditionally, ground rain gauges are utilized to evaluate the performance of gridded precipitation products, but sparsely distributed rain gauges (Figure 1) in the UB cannot represent the spatial distribution of precipitation in this region; thus, the evaluation based on these scarce precipitation observations may contain large uncertainty. In this study, the utilized reference precipitation is a newly developed daily gridded precipitation dataset with a spatial resolution of 10 × 10 km for 1961-2016 from Sun and Su [4]. For simplicity, we named this gridded precipitation from Sun and Su PCP_Sun in the following text; it was reconstructed for the UB based on China Meteorological Administration (CMA) stations and 262 newly added rain gauges within this area, and it is believed to best represent the real precipitation amount thus far in the UB [4]. Meanwhile, by employing a glaciohydrological model to evaluate the feasibility and reliability of PCP_Sun in reverse, the results show that it gives a good performance regarding hydrological simulation, snow cover, and glacier mass balance modeling in this region. Therefore, in this study, we chose the precipitation dataset PCP_Sun as the reference or benchmark precipitation to evaluate the performance of the four gridded precipitation products over the UB.

Four Gridded Precipitation Products
In this study, four widely used gridded precipitation products were selected as evaluation objects: HAR, APHRODITE, TRMM 3B42V7, and GPM IMERGE V06. These products were chosen because they have not been comprehensively examined together by using both statistics-based and hydrological modeling methods over UB with complex terrains and high altitude. The features of the four datasets are summarized in Table 2 and are introduced briefly in the following part. The HAR (High Asian Reanalysis) was generated by dynamical downscaling of global analytical data through WRF (Weather Research and Prediction) model over the TP and its surrounding areas [16]. The HAR provides precipitation products with two kinds of resolution, namely, at 30 km and 10 km resolutions. In this study, the precipitation dataset with 10 km resolution for 2001-2013 was selected.
The APHRODITE precipitation product is a daily gridded dataset covering the whole Asia and spans from 1951 to 2015 with a spatial resolution of 0.25 • × 0.25 • [13]. This dataset was produced by collecting 5000-12,000 rainfall stations, which represents 2.3-4.5 times the data made available through the Global Telecommunication System network for most precipitation products.
The Tropical Rainfall Measuring Mission (TRMM) [7] Multi-satellite Precipitation Analysis (TMPA) products consist of two versions, i.e., the post-processed research 3B42 product and the real-time 3B42RT product. One important difference between these two products is employing monthly rain gauges for bias adjustment in the research 3B42 datasets. Some studies have already implied that the post-processed research product is more suitable for research work than the real-time product [32]. Therefore, the daily precipitation data from version 7 of post-processed research daily product, TMMM 3B42V7 with a spatial resolution of 0.25 • , were used in this study.
Since the GPM era is coming to us, some studies have focused on the evaluation of satellite precipitation products from GPM. Among the GPM products, the Integrated Multi-satellite Retrievals for GPM (IMERG) has recently received more attention. The GPM IMERG algorithm provides three levels of products, namely the near-real-time 'Early' and 'Late' run products and the post-real-time 'Final' run product [18]. Compared to the 'Early' and 'Late' run products, the IMERG final run product is more accurate since it is adjusted with the Global Precipitation Climatology Centre (GPCC) monthly gauge observations [38]. In this study, the GPM IMERGE final run version 6 with the daily timescale and a spatial resolution of 0.1 • × 0.1 • was used.
For brevity purposes, shorter names (TRMM and GPM) are used from here instead of complete names of satellite-based products (TMMM 3B42V7 and GPM IMERGE V6, respectively).

Other Data
In addition to precipitation data, other meteorological data such as air temperature and wind speed were required as inputs into the Variable Infiltration Capacity (VIC) glacier model. In this study, these meteorological data were obtained from China Meteorological Administration (CMA) stations within and around the UB ( Figure 1).
Meanwhile, geographical data, including soil texture, topography, and vegetation, were also used to drive the VIC model. The soil data are from the Harmonized World Soil Database with a spatial resolution of 30 arc-second (https://www.fao.org/soils-portal/soilsurvey/soil-maps-and-databases/harmonized-world-soil-database-v12/en/, accessed on 9 January 2022). The global Digital Elevation Models (DEMs) with a spatial resolution of 1 km were obtained from GTOPO30 (http://eros.usgs.gov/#/Find_Data/Products_ and_Data_Available/gtopo30_info, accessed on 9 January 2022). The vegetation data were obtained from the global vegetation classifications provided by the University of Maryland [39].
The glacier distribution over the UB can be obtained from the Chinese Glacier Inventory (CGI); it was released by the National Cryosphere Desert Data Center of China (http://www.ncdc.ac.cn, accessed on 9 January 2022).
Moreover, for hydrological evaluation, the daily and monthly streamflow data at Nuxia and Lhasa gauges for years 1990-2013 were obtained from the Tibetan Hydrological Bureau, while only monthly runoff data for the years 1990 to 2013 at Yangcun station have been collected.

Methods
In this study, the 10 km × 10 km gridded precipitation dataset PCP_Sun, which was reconstructed by using the densest rainfall stations network so far in the UB, was utilized as a benchmark to evaluate the four gridded precipitation products. Meanwhile, since the spatial resolutions of the four precipitation datasets are different, in order to a facilitate comparison with the PCP_Sun, they were all resampled to 10 km × 10 km resolution by using the nearest-neighbor method, which has been widely employed in satellite precipitation evaluation studies [16,29,34]. In addition, the evaluation period was set from 2001 to 2013, which is the overlapping period of precipitation data and available observed streamflow data, so that statistics-based direct comparison and hydrologicalsimulation-based indirect validation period can be consistent with each other in regard to the time span.

Statistical Metrics
To assess the four gridded precipitation products, several widely used statistical indices were adopted in this study ( Table 3). The Pearson correlation coefficient (CC) describes the linear agreement between the gridded precipitation datasets and observed precipitation. The relative bias (RB) depicts the systematic bias between the gridded precipitation and observation. The root-mean-squared error (RMSE) corresponding to the square root of the average of the squared differences between the gridded datasets and the observed precipitation was used to measure the average magnitude of absolute error. In addition, the probability of detection (POD), false-alarm ratio (FAR), and critical success index (CSI) were used to examine the capability of gridded precipitation products to detect the rainfall events. The POD describes the fraction of occurred precipitation events that were correctly detected by the precipitation products. FAR measures the ratio of rainfall events where gridded precipitation products detect rainfall but observed rainfall does not occur. CSI indicates the overall ratio of rainfall events correctly captured by the gridded precipitation. Table 3. List of the statistical metrics used for evaluating precipitation products.

Statistic Metrics Formula Unit Perfect Value
Root-mean-squared error (RMSE) Notation: n, number of samples; hit (H, observed precipitation correctively detected); miss (M, observed precipitation not detected); false (F, precipitation detected but not observed); R i , reference observed precipitation; G i , gauged precipitation.

Hydrological Model
In this study, the Variable Infiltration Capacity glacier (VIC-glacier) model [33] was used, which couples a degree-day glacier algorithm with the original VIC model [40]. In the VIC-glacier model, the total runoff, including the glacier melt water from each grid, is calculated as follows: where R i is the total runoff for gird i, f is the fraction of glacier area, M i is the calculated glacier runoff by using the degree-day model, and R vic is the runoff from glacier-free area. The parameter f can be derived by dividing the grid area by glacier area on the grid cell.
The VIC-glacier model also includes a two-layer energy balance snow model, frozen soil, and permafrost algorithm to represent the cold land processes. Therefore, the VIC-glacier model has been recently utilized to simulate runoff in a few river and lake basins over the TP [4,[41][42][43].
In the VIC-glacier model, two categories of model parameters need to be determined: (1) parameters in the degree-day model, which mostly involves the determination of degreeday factors; and (2) some sensitive parameters in the VIC model, including the infiltration parameter (b infilt ), the thickness of the second soil layers (d 2 ), and the base flow parameters (W s , D s , and D smax ) ( Table 4). The initial values for DDF snow/ice in the degree-day model and VIC model parameters were set by referring to previous research studies [33,43]. Then the manual calibration, i.e., the trial-and-error method, was employed to calibrate the VIC-glacier model by using the daily observed flow data from 1990 to 2000 at Nuxia gauge, while the validation period was set for years 2001-2013. The percent bias (PBIAS), Nash-Sutcliffe efficiency coefficient (NSE), and Kling-Gupta efficiency (KGE) [44,45] were employed to evaluate the streamflow simulation. These indices are listed as follows: where n is the number of samples; Y obs i and Y sim i are the observed and the corresponding simulated values, respectively; Y obs,mean and Y sim,mean are the arithmetic means of the observed and simulated values, respectively; r is the correlation coefficient between simulated and observed runoff; β is the bias ratio; γ is the variability ratio; µ is the mean runoff in m 3 /s; CV is the coefficient of variation; σ is the standard deviation of runoff in m 3 /s; and the subscripts s and o represent simulated and observed runoff values, respectively. Table 4 lists the final determined values of model parameters after model calibration. Table 4. Summary of model parameters, ranges, and calibrated values used in VIC-glacier model over the UB.

Model Parameter Unit Range Determined Value
Degree-day factor for ice-melt (DDF ice )  Figure 2 shows the spatial patterns of the 2001-2013 annual mean precipitation for reference precipitation and four gridded precipitation datasets over the UB. PCP_Sun presents a southeast-to-northwest gradient, ranging from over 1000 mm/year in the southeast to less than 300 mm/year in the northwest. The large amount of precipitation in the southeastern part of the basin is due to the Himalayas intercepting much of the water vapor from the Indian Ocean monsoon and producing large amounts of precipitation. In contrast, affected by both weak activity from the Indian monsoon and westerlies, the Western UB receives relatively less precipitation compared to the eastern region. With respect to the four precipitation products, on the whole, their spatial patterns are basically similar to the reference precipitation, all showing a general increasing trend from northwest to southwest. However, in the eastern part of the basin, there is still a large magnitude difference between them and the reference precipitation, especially for APHRODITE and TRMM. The reference precipitation exhibits a more than 900 mm/year in the Eastern UB, while only less than 750 mm/year can be found for the APHRODITE and TRMM datasets. As for HAR and GPM, the extent of underestimation seems to be some alleviated in this local region. The precipitation differences among the five datasets in the UB is possibly related to their different ways of generation. The HAR product was generated by dynamical downscaling of global analysis data by the WRF model. The APHRODITE dataset is an interpolated product based on observed precipitation gauges. TRMM and GPM are satellite remote-sensing products. On the whole, GPM generally outperformed the other products in terms of spatial consistency to the observed precipitation. the southeastern part of the basin is due to the Himalayas intercepting much of the water vapor from the Indian Ocean monsoon and producing large amounts of precipitation. In contrast, affected by both weak activity from the Indian monsoon and westerlies, the Western UB receives relatively less precipitation compared to the eastern region. With respect to the four precipitation products, on the whole, their spatial patterns are basically similar to the reference precipitation, all showing a general increasing trend from northwest to southwest. However, in the eastern part of the basin, there is still a large magnitude difference between them and the reference precipitation, especially for APHRODITE and TRMM. The reference precipitation exhibits a more than 900 mm/year in the Eastern UB, while only less than 750 mm/year can be found for the APHRODITE and TRMM datasets. As for HAR and GPM, the extent of underestimation seems to be some alleviated in this local region. The precipitation differences among the five datasets in the UB is possibly related to their different ways of generation. The HAR product was generated by dynamical downscaling of global analysis data by the WRF model. The APHRODITE dataset is an interpolated product based on observed precipitation gauges. TRMM and GPM are satellite remote-sensing products. On the whole, GPM generally outperformed the other products in terms of spatial consistency to the observed precipitation. Supplementary Figure S1 exhibits the relative bias of the four precipitation datasets against the reference data during the annual period, rainy season (June to September), and non-rainy season (October to the next May). At the annual period, the underestimation can be found almost over the whole UB for the APHRODITE and TRMM datasets, and the magnitude of negative bias can be more than 50% in the Northeastern UB for both products. For HAR, overestimation dominates over the Northern UB, with positive bias over 50% in the northwest, whereas there is large underestimation in the southern part, and even the negative bias can be more than 50% in the southeastern area. With respect Supplementary Figure S1 exhibits the relative bias of the four precipitation datasets against the reference data during the annual period, rainy season (June to September), and non-rainy season (October to the next May). At the annual period, the underestimation can be found almost over the whole UB for the APHRODITE and TRMM datasets, and the magnitude of negative bias can be more than 50% in the Northeastern UB for both products. For HAR, overestimation dominates over the Northern UB, with positive bias over 50% in the northwest, whereas there is large underestimation in the southern part, and even the negative bias can be more than 50% in the southeastern area. With respect to the GPM, except for the relatively large negative bias over local areas of Eastern UB, the negative deviation in the north and the positive deviation in the south are within ±20%. In respect to the rainy season, the spatial distribution of relative bias for the four precipitation datasets is similar to that of the annual period. However, the negative bias for both APHRODITE and TRMM not only tends to be more widely spread, but also the amplitude of negative bias for TRMM is even larger than that of the annual period. With regard to the non-rainy season, the four precipitation products, excluding GPM, exhibit positive bias over most of the UB.

Statistical Evaluation of Gridded Precipitation Products
Meanwhile, Figure 3 shows the mean monthly basin-average precipitation from the reference and four gridded precipitation products during 2001-2013 over the UB. For HAR, it slightly overestimated the observed precipitation for the dry season, especially during January to March, while it was basically consistent with the reference precipitation in the rest of the months. APHRODITE basically follows the reference precipitation's seasonal variation but largely underestimated precipitation in the wet season. Meanwhile, the TRMM could roughly grasp the characteristics of monsoon precipitation; that is, there is more precipitation in summer and less precipitation in winter. However, it underestimated the measured summer precipitation a lot, whereas some overestimation can be noticed for the winter precipitation. GPM showed the closest agreement to the monthly reference precipitation among the four products, especially during the dry season. In addition, Table 5 lists several key statistical indices at the annual scale for the reference precipitation and four precipitation datasets over the UB, respectively, thus indicating that the statistical feature of HAR and GPM is closer to that of PCP_Sun than the other precipitation products.
In respect to the rainy season, the spatial distribution of relative bias for the four precipitation datasets is similar to that of the annual period. However, the negative bias for both APHRODITE and TRMM not only tends to be more widely spread, but also the amplitude of negative bias for TRMM is even larger than that of the annual period. With regard to the non-rainy season, the four precipitation products, excluding GPM, exhibit positive bias over most of the UB.
Meanwhile, Figure 3 shows the mean monthly basin-average precipitation from the reference and four gridded precipitation products during 2001-2013 over the UB. For HAR, it slightly overestimated the observed precipitation for the dry season, especially during January to March, while it was basically consistent with the reference precipitation in the rest of the months. APHRODITE basically follows the reference precipitation's seasonal variation but largely underestimated precipitation in the wet season. Meanwhile, the TRMM could roughly grasp the characteristics of monsoon precipitation; that is, there is more precipitation in summer and less precipitation in winter. However, it underestimated the measured summer precipitation a lot, whereas some overestimation can be noticed for the winter precipitation. GPM showed the closest agreement to the monthly reference precipitation among the four products, especially during the dry season. In addition, Table 5 lists several key statistical indices at the annual scale for the reference precipitation and four precipitation datasets over the UB, respectively, thus indicating that the statistical feature of HAR and GPM is closer to that of PCP_Sun than the other precipitation products.   Furthermore, Figure 4 shows the box plots of grid-scale statistics for the four precipitation datasets at the daily timescale during years 2001-2013 over the UB, based on the benchmark precipitation of PCP_Sun. In addition, the median values of the six statistical metrics on the daily scale during the rainy season (June to September) and non-rainy season (October to the next May) are also listed in Table 6. In terms of the relative bias (RB) (Figure 4a), the median RB is 6.82% and −6.78%, respectively, for HAR and GPM, while both of the median RBs from APHRODITE and TRMM exhibit significant negative bias, with a magnitude of about −30%. For the RMSE and CC indices, it is apparent that APHRODITE performs the best, followed by HAR, GPM, and TRMM in order. Meanwhile, there is a slight difference between the results from HAR and GPM, and both of them have an acceptable performance in terms of these two statistical metrics. For contingency statistics, except for APHRODITE, which has the highest POD, the HAR and GPM have comparatively similar values in respect to the three indices, and, simultaneously, both of them outperform among the four gridded precipitation datasets in terms of FAR and CSI metrics. With regard to performances during the rainy and non-rainy seasons (Table 6), for the RB metric, an interesting phenomenon can be found in that there is a strong underestimation for APHRODITE and TRMM in the rainy season, whereas the two products witness overestimation in the non-rainy season, especially for TRMM, with considerable positive bias. Meanwhile, it can be found that the RMSE in the rainy season is higher than that of the non-rainy season for all the four precipitation datasets. Some research studies over TP also indicate that the RMSE is larger during summer than that of winter [6]. As for the CC, the performance in rainy season is better than that in the non-rainy season. Meanwhile, the scatter plots of the daily four gridded precipitation products and reference precipitation in the rainy and non-rainy seasons basin-wide during 2001-2013 are also shown in Figure 5. At the basin scale, APHRODITE also displays the best results, with CC over 0.6 for both periods, whereas TRMM has the worst performance, especially during the non-rainy season, with a low CC of 0.09. The HAR and GPM exhibit acceptable outcomes, both with a CC more than 0.6 and 0.5 in the rainy and non-rainy periods, respectively. Generally, the CC metric is relatively small between the four gridded precipitation datasets and reference precipitation. Based on observed precipitation from TP meteorological stations, Li et at. [38] also found that there are small CC values between satellite precipitation products (TRMM and GPM) and rain gauge measurements. The complex topography; high-altitude areas covered by snow, glaciers, permafrost; and sparsely distributed precipitation gauges over UB would affect the accuracy of these precipitation products and result in a low correlation with the reference precipitation as compared with other regions in the world. For the contingency statistics, all the four products in the rainy season behave significantly better than they do in the non-rainy season, with a remarkably smaller FAR and a considerably higher POD and CSI. Combining the results of the above six statistical metrics, on the whole, we see that HAR performs comparably to GPM, and both of them outperform the other two products.

Hydrological Evaluation of Gridded Precipitation Products
A hydrological model is an efficient method to know the basin's hydrological regime. For hydrological models, errors in precipitation input can cause significant uncertainties in flow simulation and other hydrological processes [5]. Therefore, it is important to assess the predictability and reliability of gridded precipitation products in a hydrological modeling scheme. In this section, the capability of four gridded precipitation datasets in hydrological simulation over UB is evaluated by driving the VIC-glacier model.

Comparison at Daily Scale
Firstly, the VIC-glacier model was calibrated at the UB outlet, i.e., Nuxia hydrological station for 1990-2000, based on the reference precipitation (PCP_Sun) as driving data. Next, in the validation period (2001-2013), the observed daily flow at Nuxia station was used to evaluate the model's efficiency. In addition, the observed daily flow at Lhasa station and observed monthly flow at Yangcun station were also used to assess the capability of this model in the two gauge stations during the respective calibration and validation periods. Finally, using the same parameters determined in the above calibration, the four gridded precipitation products were utilized to drive the VIC-glacier model, and their hydrological utility was evaluated by making a comparison to the observed streamflow during 2001-2013.
Seven sensitive parameters to the hydrological simulation in the VIC-glacier model were chosen as the calibrated targets, and their final values were determined by using the trial-and-error technique (listed in Table 4). Figures 6-8 show daily observed and simulated flow driven by the reference precipitation (PCP_Sun) at Nuxia and Lhasa gauges during calibration and validation periods, respectively. The NSE, PBIAS, and KGE are also presented in Figure 6 for the calibration period during 1990-2000, and Table 7 [46], the performances of VIC-glacier model driven by the reference precipitation at Nuxia and Lhasa gauges can be ranked as being at the 'very good' and 'good' level, respectively, thus further confirming the reliability of using the reference precipitation as the model input data.

Comparison at Daily Scale
Firstly, the VIC-glacier model was calibrated at the UB outlet, i.e., Nuxia hyd station for 1990-2000, based on the reference precipitation (PCP_Sun) as drivi Next, in the validation period (2001-2013), the observed daily flow at Nuxia sta used to evaluate the model's efficiency. In addition, the observed daily flow station and observed monthly flow at Yangcun station were also used to as capability of this model in the two gauge stations during the respective calibra validation periods. Finally, using the same parameters determined in th calibration, the four gridded precipitation products were utilized to drive the VIC model, and their hydrological utility was evaluated by making a compariso observed streamflow during 2001-2013.
Seven sensitive parameters to the hydrological simulation in the VIC-glacie were chosen as the calibrated targets, and their final values were determined by u trial-and-error technique (listed in Table 4). Figures 6-8 show daily observ simulated flow driven by the reference precipitation (PCP_Sun) at Nuxia an gauges during calibration and validation periods, respectively. The NSE, PBIAS, a are also presented in Figure 6 [46], the performances glacier model driven by the reference precipitation at Nuxia and Lhasa gauge ranked as being at the 'very good' and 'good' level, respectively, thus further con the reliability of using the reference precipitation as the model input data.     After the model was calibrated by the reference precipitation data, the VIC-glacier model was driven by the four gridded precipitation products from 2001 to 2013 over the whole basin, without any further adjustments to the parameters. Keeping the same parameters can let us analyze the model performance's difference pertained to the four precipitation datasets as driving input. The comparisons between observed and simulated daily streamflow at Nuxia and Lhasa gauges during the validation period of years 2001-2013 are shown in Figures 7 and 8, respectively. Meanwhile, Table 7 also exhibits the statistical metrics of flow simulation for all hydrological gauges during 2001-2013.  At Nuxia gauge, GPM shows the best performance in terms of streamflow simulation during 2001-2013 among the four precipitation products, with an NSE of 0.8, PBIAS of −7.61%, and KGE of 0.83 (Figure 7e). The model captures the rising and recession of the hydrographs well; the baseflow is especially well mimicked by the model. However, underestimation of high flow during the summer can be noticed in some years. This phenomenon may be caused by underestimation of heavy rain in GPM summer data. During the rainy season, the median RB for GPM is about −4.5% (Table 6), which could be transformed into a negative bias in the simulated peak flow in the summer season. The HAR takes the second place in terms of flow simulation and performs satisfactorily with an NSE of 0.6, PBIAS of 3.23%, and KGE of 0.78. Figure 7b shows that the HAR-driven simulations can basically follow the seasonal variation of observed flow and simultaneously demonstrates moderate results in respect to the base flow simulation. However, the peak flow simulated by the HAR underestimates the observed data before the year 2007 and overestimates it afterward. For APHRODITE and TRMM (Figure 7c,d), the two modeling runs driven by them produce generally poor simulations in terms of the NSE, PBIAS, and KGE indices (Table 7). This might be ascribed to the considerable underestimation contained in these two precipitation datasets. On the one hand, the large negative errors included in the precipitation input could propagate into the flow simulations. On the other hand, due to the nonlinearities involved in the hydrological processes, any positive or negative bias in precipitation input can be magnified into a larger bias in the simulated streamflow [32,47]. For example, the precipitation estimate from the APHRODITE demonstrates an underestimation of the observed precipitation by 30.42%, causing the simulated discharge to be 54.86% lower than the measured flow. At Lhasa gauge (Figure 8), the GPM-based streamflow simulation also outperforms the other three precipitation products' results, with an NSE of 0.61, PBIAS of −15.81%, and KGE of 0.72, which can be classified as being at the 'satisfactory' level. The negative bias for the hydrological modeling (−15.81%) is due to the tendency of GPM to underestimate the observed precipitation in the Lhasa basin. As indicated in Figure 2, compared to PCP_Sun, the GPM displayed a prevailing underestimation over Lhasa Basin which can be propagated to simulated runoff. Meanwhile, it can be observed in Figure 8e that the GPM-driven simulation underestimates peak flows, and this also might be due to the underestimation of summer rainfall. For the other three precipitation datasets (HAR, APHRODITE, and TRMM), all of their hydrological simulation results were unsatisfactory according to the NSE, PBIAS, and KGE metrics ( Table 7). The flow simulated by the HAR largely overestimates the observed runoff, whereas considerable underestimation can be detected from both APHRODITE-and TRMM-driven simulations (Figure 8b-d), and this may be attributed to the positive bias for HAR and large negative bias in both of APHRODITE and TRMM as the model inputs. Figures 9-12 show the comparisons between the observed and simulated monthly streamflow driven by the PCP_Sun at the three gauges during the calibration (1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000) and validation periods (2001-2013), respectively. In addition, the NSE, PBIAS, and KGE in the calibration period are also listed on Figure 9, while Table 7 exhibits these metrics corresponding to the calibration period. On the whole, good agreement between the observed and simulated monthly flow can be seen at the three gauges during both calibration and validation periods. Furthermore, the high flow in summer and baseflow in winter can also be captured by the VIC-glacier model. The KGE and NSE are more than 0.84, and PBIAS is within ±15% in both calibration and validation periods at Nuxia, Yangcun, and Lhasa gauges, indicating that the runoff simulation based on PCP_Sun as input performs good in the three gauges. and validation periods (2001-2013), respectively. In addition, the NSE, PBIAS, and KGE in the calibration period are also listed on Figure 9, while Table 7 exhibits these metrics corresponding to the calibration period. On the whole, good agreement between the observed and simulated monthly flow can be seen at the three gauges during both calibration and validation periods. Furthermore, the high flow in summer and baseflow in winter can also be captured by the VIC-glacier model. The KGE and NSE are more than 0.84, and PBIAS is within ±15% in both calibration and validation periods at Nuxia, Yangcun, and Lhasa gauges, indicating that the runoff simulation based on PCP_Sun as input performs good in the three gauges.  Then the monthly observed flow were compared with the simulated streamflow driven by the four precipitation products (HAR, APHRODITE, TRMM and GPM) at the three gauges (Figures 10-12). At Nuxia gauge, the GPM modeling discharge performs the best and can basically reproduce the measured runoff with an NSE of 0.88, PBIAS of −7.82%, and KGE of 0.86, which is even comparable to simulated results driven by the observed precipitation (PCP_Sun). Meanwhile, the simulated flow from the HAR as forcing also exhibits a general good result with an NSE of 0.7, PBIAS of 2.99%, and KGE of 0.82 during the entire period despite poor performance in some years. For APHRODITE and TRMM, however, their simulated flows significantly underestimate observed runoff on the whole and indicate unsatisfactory results due to the large negative bias in precipitation input. At Yangcun gauge, GPM also exhibits the best hydrological performance in terms of the NSE (0.72) but having a relatively large PBIAS; overall, the simulated result can be categorized as being at the 'satisfactory' level. The HAR-driven simulation overestimates the observed runoff by 42.5% due to the large overestimation of precipitation in the basin above Yangcun gauge. In contrast, as a result of the large negative bias for both APHRODTE and TRMM (Figure 3), the simulated runoff from them has a respective low value and even a negative NSE. Generally, hydrological simulation outcomes based on all the other three precipitation products (HAR, APHRODITE, and TRMM) are all unsatisfactory. For Lhasa gauge, the hydrological simulations of these gridded precipitation datasets are similar to those of Yangcun gauge; that is, only the performance of GPM-driven simulation is satisfactory, whereas poor performances can be found in the other three precipitation-based hydrological simulations.

Comparison at High Flow Simulation
As floods during the rainy season are a major safety concern for local regions and downstream basins, in this section, the simulated performances of gridded precipitation datasets in rainy season and their high flow simulations are evaluated. From Section 3.2.1, it can be noticed that just the daily simulation of GPM is satisfactory at both Nuxia and Lhasa gauges; therefore, only the modeling results based on GPM are discussed here. Table 8 lists the statistical metric values for the daily simulated outcomes from reference precipitation (PCP_Sun) and GPM. It can be indicated that the performance from reference precipitation is satisfactory; meanwhile, GPM exhibits an acceptable result in rainy seasons during 2001-2013, although with relatively large negative bias. Figure 13 displays the daily flow duration curves (FDCs) of the observed and simulated flow for the years 2001-2013, suggesting that the PCP_Sun-based FDC is generally consistent with that of the observed flow, whereas some discrepancies can be found in simulated high flow from GPM. To further quantitatively investigate the performance of high flow simulations using the PCP_Sun and GPM, the flows corresponding to 1%, 5%, and 10% quantile of FDC were compared to that of observed flow (Table 9). Among the high flow simulation using the two precipitation inputs, the PCP_Sun-based run has an excellent performance, with an RB no more than ±5% for the three high flow indices at the two gauges. As for the GPM, the computed high flow is 12.17-25.45% lower than the observations, thus suggesting that GPM cannot successfully catch the high flow over the UB; this also implies that there is still large room for GPM developers to further refine the algorithms to improve its flood prediction in the TP.   Then the monthly observed flow were compared with the simulated streamflow driven by the four precipitation products (HAR, APHRODITE, TRMM and GPM) at the three gauges (Figures 10-12). At Nuxia gauge, the GPM modeling discharge performs the best and can basically reproduce the measured runoff with an NSE of 0.88, PBIAS of −7.82%, and KGE of 0.86, which is even comparable to simulated results driven by the observed precipitation (PCP_Sun). Meanwhile, the simulated flow from the HAR as forcing also exhibits a general good result with an NSE of 0.7, PBIAS of 2.99%, and KGE of 0.82 during the entire period despite poor performance in some years. For APHRODITE and TRMM, however, their simulated flows significantly underestimate observed runoff on the whole and indicate unsatisfactory results due to the large negative bias in precipitation input. At Yangcun gauge, GPM also exhibits the best hydrological performance in terms of the NSE (0.72) but having a relatively large PBIAS; overall, the simulated result can be categorized as being at the 'satisfactory' level. The HAR-driven simulation overestimates the observed runoff by 42.5% due to the large overestimation of precipitation in the basin above Yangcun gauge. In contrast, as a result of the large negative bias for both APHRODTE and TRMM (Figure 3), the simulated runoff from them

Product-Specific Calibration for the HAR and APHRODITE
In Section 3.2, the hydrological evaluation for the four gridded was based on parameters calibrated by using the reference precipitation data (PCP_Sun). Some studies have indicated that recalibrating the hydrological model with respective precipitation product could improve the simulation accuracy [5,6]. As GPM has already performed wel in hydrological simulation at all the three stations (Lhasa, Yangcun, and Nuxia) with the parameters gained by the PCP_Sun benchmark calibration. Moreover, TRMM not only contains large bias but also does not get the seasonality of the PCP_Sun correct; therefore in this part, only HAR and APHRODITE were separately used to recalibrate the VIC glacier model during 2001-2013. In addition, to be consistent with the evaluation period used in Section 3.2, the whole period for 2001-2013 was chosen as the assessment period According to the study of Zhang et al. [33] in the major river basins over TP, the variable infiltration curve parameter (binfilt) and the second soil layer depth (d2) were identified as most sensitive among the VIC-glacier model parameters for calibration Meanwhile, their research also indicated that an increase of binfilt and a decrease of d2 tend to enhance runoff production and vice versa. Thus, in this study, these two parameters (binfilt and d2) were further calibrated for the HAR and APHRODITE datasets. The fina calibrated parameters for the respective HAR and APHRODITE datasets are shown in Supplementary Materials Table S1. The simulated daily and monthly streamflows at the Nuxia, Yangcun, and Lhasa stations based on the input-specific calibration of HAR and APHRODOTE are listed in Supplementary Figures S2-S5. Moreover, Table S2 exhibits the statistical performance of the discharge simulation under the product-specific calibration method. Meanwhile, to compare the performance differences between the PCP_Sun benchmarked calibration and the product-specific calibration, the statistical metrics for the model outcomes based on the PCP_Sun, are also listed in Supplementary Table S2 (numbers in parentheses).
As indicated in Supplementary Table S2, the performances of all the simulations except for HAR, at Nuxia hydrological stations were improved by a product-specific calibration relative to the simulation based on the PCP_Sun-benchmarked calibration

Product-Specific Calibration for the HAR and APHRODITE
In Section 3.2, the hydrological evaluation for the four gridded was based on parameters calibrated by using the reference precipitation data (PCP_Sun). Some studies have indicated that recalibrating the hydrological model with respective precipitation product could improve the simulation accuracy [5,6]. As GPM has already performed well in hydrological simulation at all the three stations (Lhasa, Yangcun, and Nuxia) with the parameters gained by the PCP_Sun benchmark calibration. Moreover, TRMM not only contains large bias but also does not get the seasonality of the PCP_Sun correct; therefore, in this part, only HAR and APHRODITE were separately used to recalibrate the VIC-glacier model during 2001-2013. In addition, to be consistent with the evaluation period used in Section 3.2, the whole period for 2001-2013 was chosen as the assessment period.
According to the study of Zhang et al. [33] in the major river basins over TP, the variable infiltration curve parameter (b infilt ) and the second soil layer depth (d 2 ) were identified as most sensitive among the VIC-glacier model parameters for calibration. Meanwhile, their research also indicated that an increase of b infilt and a decrease of d 2 tend to enhance runoff production and vice versa. Thus, in this study, these two parameters (b infilt and d 2 ) were further calibrated for the HAR and APHRODITE datasets. The final calibrated parameters for the respective HAR and APHRODITE datasets are shown in Supplementary Materials Table S1. The simulated daily and monthly streamflows at the Nuxia, Yangcun, and Lhasa stations based on the input-specific calibration of HAR and APHRODOTE are listed in Supplementary Figures S2-S5. Moreover, Table S2 exhibits the statistical performance of the discharge simulation under the product-specific calibration method. Meanwhile, to compare the performance differences between the PCP_Sun-benchmarked calibration and the product-specific calibration, the statistical metrics for the model outcomes based on the PCP_Sun, are also listed in Supplementary Table S2 (numbers in parentheses).
As indicated in Supplementary Table S2, the performances of all the simulations, except for HAR, at Nuxia hydrological stations were improved by a product-specific calibration relative to the simulation based on the PCP_Sun-benchmarked calibration, especially in terms of the NSE and PBIAS metrics. This phenomenon could be due to the fact that the model parameter recalibration is able to partially compensate the streamflow bias resulting from inaccurate precipitation inputs. For HAR, in order to reduce runoff production over the UB, the calibrated parameter, d 2 , increased from 1.1 m in the PCP_Sunbased model run to 2.9 m in the HAR-based recalibration. In contrast, for APHRODITE, to increase runoff-generated, the d 2 was calibrated to be 0.1 m, which is much lower than that in the PCP_Sun-based run. However, it should be noticed that this compensation might be only valid within a certain range of precipitation bias. For HAR recalibration, the determined d 2 was 2.9 m, which is close to the physical upper limit of d 2 . Although the whole performances at all three hydrological stations (Lhasa, Yangcun, and Nuxia) can be ranked as satisfactory class, the PBIAS metric is still more than 15% at both Yangcun and Lhasa gauges; and this also suggests that the large positive bias of simulated total runoff by the specific input calibration could not be totally reduced by further calibration and might need bias correction of HAR at the two subbasins before being used as VICglacier model input data. With respect to the APHRODITE, although the simulated results were improved by product-specific calibration, the performances at all three hydrological stations are still poor in terms of a large negative bias and low NSE and KGE. As indicated in Supplementary Table S2, the d 2 calibrated with APHRODITE is close to its lower limit, also implying that the VIC-glacier model's performance could not be considerably improved by calibrating d 2 , because the large negative bias of APHRODITE might be beyond the threshold of precipitation error, and, thus, the parameter calibration could not offset the runoff biases deduced from the large APHRODITE error over the UB. In the UB, using the CMA data as VIC model forcing, Zhang et al. [33] also found that model performance cannot be considerably improved through calibration as a result of large underestimation of CMA precipitation input in this region. The unsatisfactory performance from the specific APHRODITE recalibration demonstrates its little potential in streamflow modeling in this region; this is basically in line with the conclusion derived by using benchmarking calibration. As the magnitude of TRMM is comparable to the APHRODITE, we speculate that its hydrological outcome by using the product-specific calibration is similar to that of APHRODITE. Meanwhile, although the modeling streamflow can be improved to some extent by product-specific calibration, this recalibration method should be taken with a grain of salt because it may result in unrealistic parameters values in some cases [6], and the other hydrological components, such as groundwater and evaporation, may not represent the real field condition, an issue that will be analyzed in our future work.

Strengths and Limitations of the Four Precipitation Products
In respect of statistical assessment, the overall performances of GPM and HAR are comparable to each other's, and both are superior to that of the other two products (APHRODITE and TRMM). However, apart from the relative bias (RB) index, APHRODITE has also shown excellent results in other statistical metrics. For example, the median CC of APHRODITE is the largest, and its POD outperforms the other three products, as well. However, the median RB for APHRODITE is −30.42%, implying that it severely underestimates the benchmark precipitation over this basin. Some studies also found that the APHRODITE product underestimated precipitation not only in the UB but also over the areas in or around the Tibetan Plateau [34,48,49]. Based on corrected-Chinese Meteorological Administration precipitation, the research from Tong et al. [34] indicates that APHRODITE underestimates precipitation by 25% over the UB, while it also exhibits systematic negative bias in the other five river basins of TP, ranging from −13% to −24%. Moreover, Ji et al. [14] evaluated the APHRODITE data over the whole Brahmaputra River Basin and found that it demonstrates an average RB of −29.69% in this region. The large underestimation of APHRODITE in the UB is possibly related to the way of generating this product. The APHRODITE dataset is an interpolated product based on observed precipitation gauges. The rainfall stations are distributed sparsely over TP with its surroundings; moreover, most of stations are located in the valley, and this may cause large uncertainty Remote Sens. 2022, 14, 2936 23 of 30 in this interpolation-based product. Yatagai et al. [13] also pointed out that precipitation underestimation for APHRODITE over TP is due to a shortage of rainfall data input for Nepal, Bhutan, and Northern India. In addition, the inadequacies involved in the interpolation method may also affect the accuracy of this product in the Himalaya mountain areas, including the UB [50]. All of these factors combined together might increase the uncertainty of APHRODITE over these regions.
Meanwhile, the performance of the TRMM satellite estimate is also poor in respect to almost all statistical indices, probably due to its deficiencies in detecting precipitation over high-altitude areas covered by snow, glaciers, and permafrost of UB. The precipitation estimate from TRMM shows a large underestimation of the reference precipitation. Some studies have also found that satellite data underestimate precipitation in high mountains, including the TP [5]. Fortunately, as the successor of TRMM, GPM satellite precipitation leads to a new era of remote-sensing precipitation products, providing more chances for application in meteorological and hydrological studies. Compared to TRMM, there are several critical improvements in GPM sensors, such as upgrading the radar to two frequencies and adding high-frequency channels to the Passive Microwave imager, which adds sensitivity to light precipitation and snowfall [51]. These considerable increments make GPM IMERG products deliver a better performance than that of the TRMM product in the Tibetan Plateau [30,31,35,[52][53][54]. In this study, the statistical metrics of GPM were also superior to those of TRMM in the UB; this can also corroborate the above conclusions.
Furthermore, the predictive ability of four gridded precipitation datasets in streamflow was evaluated by employing the VIC-glacier model. The assessment based on the hydrological modeling framework is preferred because it is not subject to the scale discrepancy problem which may turn up when using rainfall station data for validation. At Nuxia gauge, the hydrological simulations of the four precipitation products indicate that the performances of GPM and HAR were rated as 'very good' and 'good' at the monthly scale, respectively, whereas unsatisfactory outcomes appeared in both APHRODITE and TRMM. For Yangcun and Lhasa gauges, however, only the simulated result from GPM as model-driven data was satisfactory, while all of the other three precipitation products behaved unsatisfactorily in modeling runoff at both the daily and monthly scales. The poor simulation of streamflow driven by APHRODITE and TRMM can be ascribed to their large negative deviation from the observed rainfall because errors in the precipitation inputs could be propagated into hydrological modeling results. Meanwhile, the low and even a negative NSE for the hydrological simulations of APHRODITE and TRMM at the three gauges could further imply their little potential in runoff modeling in this region. For HAR, its performances in hydrological application are inferior to that of GPM at Nuxia gauge, and they are even poor at Yangcun and Lhasa stations on the whole. The poor hydrological simulations at Yangcun and Lhasa gauges for HAR might be due to its large overestimation of observed rainfall over the two subbasins. As shown in Figure 2, it is clear that HAR has a large positive bias over most of the two subbasins, which can be propagated to the runoff simulation and results in an unsatisfactory outcome. In the upper reach of the Shule River Basin located in the northeast of the Tibetan Plateau, it was also found that the HAR obviously overestimated precipitation in this area [55]. The HAR product was generated by the dynamical downscaling of global analysis data by the WRF model. As a regional climate model, the WRF model may exhibit systematic deviations in simulated precipitation over TP and its surrounding due to the complex terrain. For example, the WRF model can still suffer from significant wet bias over the North Himalayas [56]. Therefore, to better apply the HAR in the hydrological simulation of UB, it might first need bias correction based on more ground precipitation stations in the future study. Moreover, based on the case of HAR, it also reminds us that the good hydrological simulation of one precipitation product at the basin outlet does not mean that it would also behave well in the runoff modeling over the other subbasins within the basin due to the complicated spatial variation of precipitation over this region.
In addition, the above analysis also reveals that the statistical-indices-based direct evaluation or indirect assessment from the hydrological simulation alone might be insufficient to measure how good a precipitation product is at the basin scale, and it is preferable to combine the two evaluation strategies to comprehensively assess the precipitation datasets, making the results more reasonable and reliable. Generally, the GPM product outperformed the four gridded precipitation datasets in both the statistical and hydrological evaluations in the UB. The high spatial resolution of this data may be useful for the hydrological simulations in the middle and lower reaches of the Brahmaputra River Basin. Meanwhile, the GPM product could also be a valuable reference precipitation in respect to analyzing the temporal and spatial patterns of rainfall over the TP, especially in the Western TP with sparsely distributed rainfall gauges. However, in this study, we also found that the detection capability of the GPM product varies with altitude. As indicated in Figure 14, when the altitude increases, the performance of the POD gradually tends to degrade, while the FAR tends to increase and simultaneously CSI tends to decease insignificantly. From Figure 14a, it can be noticed that there are two blocks in the scatter plot of the POD versus elevation, and this is possibly due to the weak correlation between these two variables. Some previous studies also found that the detectability decreases with altitude for GPM in the TP [29,38]. With the increasing altitude, the percentage of snowfall in precipitation tends to increase in the UB. Meanwhile, there is a relatively large bias between the high flow simulated by GPM and the observation, as shown in the Section 3.2.3. These limitations suggest that the current GPM-era satellite precipitation product still has much room to further develop its algorithms to improve the estimation of solid precipitation and extreme heavy rainfall in the UB or over the TP with complicated precipitation patterns.

Uncertainty in Statistical Results Based on Different Benchmark Precipitation
In this study, given that the available Chinese Meteorological Administration (CMA) stations are sparse and unevenly distributed in the UB (Figure 1), an assessment based on such scarce precipitation gauge networks may bring much uncertainty. Some studies also recommend utilizing more than one reference precipitation dataset to evaluate gridded precipitation products [18]. In Sections 3.1 and 3.2 we used the gauge-based gridded dataset PCP_Sun as the reference data in the statistical and hydrological evaluations, as they were reconstructed with a high-density gauge networks and may best represent the real precipitation data in the UB so far [4]. To investigate the uncertainty caused by using different benchmark precipitation datasets, fifteen CMA stations within the UB (Figure 1) were chosen as another benchmark precipitation to derive the corresponding statistical metric values by also using the nearest-neighbor method ( Figure 15). As we can see, the most notable difference between the Figures 4 and 15 lies in the RB metric values for HAR and GPM. Based on the PCP_Sun as the benchmark, the median RB for the HAR exhibits a positive bias of 6.82%, whereas, in contrast, a negative bias of −28.26% for HAR can be found by using the 15 CMA stations as the benchmark. Moreover, there is a similar opposite conclusion to the GPM by using the PCP_Sun and 15 CMA stations as a respective benchmark precipitation dataset. This phenomenon implies that the selection of different precipitation dataset as a benchmark could severely impact the statistical results. However, we thought that the outcomes based on the PCP_Sun were more reliable and reasonable than the results gained by using the 15 CMA stations only. On the one hand, the CMA stations are sparsely distributed in the UB, and most of the stations are located in the valley; therefore, the utilized 15 CMA stations may not characterize the actual precipitation regimes of the UB, and the statistical results based on this low-density station may contain large uncertainty. In contrast, the PCP_Sun dataset was generated by using a relatively high density of rainfall stations and could more accurately depict the real precipitation distribution over UB. On the other hand, as precipitation is the major driver of river runoff in this basin, the hydrological performance is closely related to the quality of precipitation input. Thus, modeling runoff can provide an opportunity for independent validation of precipitation input in the UB. The satisfactory daily hydrological simulation and the acceptable flood-events simulation at Nuxia, Yangcun, and Lhasa gauges could indirectly indicate the reliability of PCP_Sun as the benchmark dataset. Of course, it is undeniable that the precipitation dataset PCP_Sun may contain uncertainty. In future work, more studies will be implemented to thoroughly investigate the uncertainties involved in precipitation evaluation, such as the selection of benchmark precipitation datasets, utilization of diverse hydrological models, and application of different parameter calibration methods.
simultaneously CSI tends to decease insignificantly. From Figure 14a that there are two blocks in the scatter plot of the POD versus elev possibly due to the weak correlation between these two variables. Som also found that the detectability decreases with altitude for GPM in th the increasing altitude, the percentage of snowfall in precipitation tend UB. Meanwhile, there is a relatively large bias between the high flow s and the observation, as shown in the Section 3.2.3. These limitation current GPM-era satellite precipitation product still has much room to algorithms to improve the estimation of solid precipitation and extrem the UB or over the TP with complicated precipitation patterns.

Conclusions
In this study, based on newly developed more reliable reference precipitation data, we firstly assessed the accuracy and detection ability of HAR, APHRODITE, TRMM, and GPM products over the UB during 2001 to 2013. Then the potential utility of the four precipitation datasets for streamflow simulation was evaluated with the VIC-glacier model. The main findings are listed as follows.
For statistical assessment, the overall results of GPM and HAR are comparable to each other, and both of them outperform APHRODITE and TRMM at the daily scale. Except for the statistical index 'RB', the APHRODITE also shows superior outcomes with the smallest RMSE and the highest CC and POD. However, both APHRODITE and TRMM significantly underestimate the reference precipitation at the basin scale. In addition, for most of statistical indices, the four precipitation datasets generally indicate better results in the rainy season than in the non-rainy season.
With regard to hydrological evaluation, the GPM-based simulation shows the best results among the four precipitation products on both daily and monthly scales at all the three gauges (Nuxia, Yangcun, and Lhasa). The simulated runoff derived from HAR only indicates a satisfactorily outcome at Nuxia gauge, whereas it performs poorly at both Yangcun and Lhasa gauges due to large overestimations of precipitation in these two subbasins. The poor hydrological prediction skills from both APHRODITE and TRMM at

Conclusions
In this study, based on newly developed more reliable reference precipitation data, we firstly assessed the accuracy and detection ability of HAR, APHRODITE, TRMM, and GPM products over the UB during 2001 to 2013. Then the potential utility of the four precipitation datasets for streamflow simulation was evaluated with the VIC-glacier model. The main findings are listed as follows.
For statistical assessment, the overall results of GPM and HAR are comparable to each other, and both of them outperform APHRODITE and TRMM at the daily scale. Except for the statistical index 'RB', the APHRODITE also shows superior outcomes with the smallest RMSE and the highest CC and POD. However, both APHRODITE and TRMM significantly underestimate the reference precipitation at the basin scale. In addition, for most of statistical indices, the four precipitation datasets generally indicate better results in the rainy season than in the non-rainy season.
With regard to hydrological evaluation, the GPM-based simulation shows the best results among the four precipitation products on both daily and monthly scales at all the three gauges (Nuxia, Yangcun, and Lhasa). The simulated runoff derived from HAR only indicates a satisfactorily outcome at Nuxia gauge, whereas it performs poorly at both Yangcun and Lhasa gauges due to large overestimations of precipitation in these two subbasins. The poor hydrological prediction skills from both APHRODITE and TRMM at all three hydrological stations also imply their little potential in runoff simulation over the UB.
The evaluation of the four gridded precipitation datasets in this study could provide valuable references to precipitation selection for streamflow simulation in the UB or even similar TP basins with sparse ground-based observed rainfall stations. Generally, GPM performs acceptably in both statistical assessments and hydrological evaluations, indicating an encouraging potential for meteorological and hydrological studies in the UB; however, it does exhibit some bias. In the near future, we will carry out bias correction of the GPM product, which could reduce the uncertainty involved in hydrological simulation and the benefit for water resources assessment and management in these poorly gauged TP basins.
It should also be noted that the investigation frame in this study could be used as a reference for choosing suitable precipitation products for hydrological applications in a local region, particularly in basins with scarce rainfall gauges. However, this study just focused on the accuracy of streamflow simulations driven by the four precipitation products and did not investigate their roles in modeling other hydrological components such as groundwater and evaporation. In the future work, we will pay attention to both streamflow simulation and other hydrological variables' modeling to comprehensively evaluate the validity of different precipitation datasets.
Supplementary Materials: The following supporting information can be downloaded at https://www. mdpi.com/article/10.3390/rs14122936/s1. Figure S1. Spatial distributions of relative bias (%) of four precipitation datasets to the PCP_Sun at annual period, rainy season, and non-rainy season. Figure S2. Daily flow simulation at Nuxia (a) and Lhasa hydrological stations (b), with recalibrated parameters using HAR-specific input. Figure S3. Monthly flow simulation at Nuxia (a), Yangcun (b), and Lhasa hydrological stations (c), with recalibrated parameters using HAR-specific input. Figure S4. Daily flow simulation at Nuxia (a) and Lhasa hydrological stations (b), with recalibrated parameters using APHRODITE-specific input. Figure S5. Monthly flow simulation at Nuxia (a), Yangcun (b), and Lhasa hydrological stations (c), with recalibrated parameters using APHRODITE-specific input. Table S1. List of VIC-glacier model parameters, ranges, and recalibrated values of HAR and APHRODITE. Table S2. Statistical indices of the simulated streamflow at the three hydrological stations by using product-specific calibration method for HAR and APHRODITE in the UB.
Author Contributions: All authors contributed significantly to this work. Y.Z. designed the study; Q.J. and L.Z. collected and processed the precipitation data; X.L. performed the numerical simulations of streamflow. The manuscript was prepared by Y.Z. and revised by C.-Y.X. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement:
The gauge-based precipitation data PCP_Sun are available through [4]. The streamflow data were obtained from the Hydrology and Water Resources Bureau of the Tibet Autonomous Region through a restricted-use agreement per government regulations. Readers interested in using these streamflow data need to follow the same procedure. Other data in this study, such as the four precipitation products, the climatic data, and the glacier data, can be downloaded for free from the website links attached in the main text.