Magnitude Agreement, Occurrence Consistency, and Elevation Dependency of Satellite-Based Precipitation Products over the Tibetan Plateau

: Satellite remote sensing is a practical technique to estimate global precipitation with adequate spatiotemporal resolution in ungauged regions. However, the performance of satellite-based precipitation products is variable and uncertain for the Tibetan Plateau (TP) because of its complex terrain and climate conditions. In this study, we evaluated the abilities of nine widely used satellite-based precipitation products over the Eastern Tibetan Plateau (ETP) and quantiﬁed precipitation dynamics over the entire TP. The evaluation was carried out from three aspects, i.e., magnitude agreement, occurrence consistency, and elevation dependency, from grid-cell to regional scales. The results show that the nine satellite-based products exhibited di ﬀ erent agreement with gauge-based reference data with median correlation coe ﬃ cients ranging from 0.15 to 0.95. Three products (climate hazards group infrared precipitation with stations (CHIRPS), multi-source weighted-ensemble precipitation (MSWEP), and tropical rainfall measuring mission multi-satellite precipitation analysis (TMPA)) generally presented the best performance with the reference data, even in complex terrain regions, given their root mean square errors (RMSE) of less than 25 mm / mon. The climate prediction center merged analysis of precipitation (CMAP) product has relatively coarse spatial resolution, but it also exhibited good performance with a bias of less than 20% in watershed scale. Two other products (precipitation estimation from remotely sensed information using artiﬁcial neural networks-cloud classiﬁcation system (PER-CCS) and climate prediction center morphing technique-raw (CMORPH-RAW)) overestimated precipitation with median RMSEs of 87 mm / mon and 45 mm / mon, respectively. All the precipitation products generally exhibited better agreement with the reference data for rainy season and lower-elevation regions. All of the products captured precipitation occurrence well, with hit event over 60%, and similar percentages of missed and false event. According to the evaluation, the four products (CHIRPS, MSWEP, TMPA, and CMAP) revealed that the annual precipitation over the TP ﬂuctuated between 333 mm / yr and 488 mm / yr during the period 2003 to 2015. The study indicates the importance of integration of multiple data sources and post-processing (e.g., gauge data fusion and elevation correction) for satellite-based products and have implications for selection of suitable precipitation products for hydrological modeling and water resources assessment for the TP.


Introduction
Precipitation is a critical variable dominating land surface hydrological processes, e.g., runoff generation and soil moisture dynamics, and regulating energy balances associated with latent and sensible heat fluxes [1,2]. Accurate estimation of precipitation is necessary for water resource management, climate change detection, and hydrological modeling, and it is particularly important for the Tibetan Plateau (TP), which is regarded as the Asian water tower for sources of major Asian river basins [3].
Precipitation can be generally estimated using in situ gauge networks, global climate models (GCMs), reanalysis systems, and remote sensing retrievals. In situ measurement with gauge networks is a traditional approach to obtaining point-scale precipitation information [1,4]. Due to restrictions of the cost and the environmental conditions, however, gauge networks are unfortunately sparse in many remote areas [3]. For example, a large portion of TP, particularly in the western TP, does not have in situ gauge networks for precipitation measurement because of the severe weather, high elevation, and complex terrain [5].
GCMs, reanalysis systems, and satellite sensors provide attractive options for precipitation estimation and make great compensation for the sparse distribution of in situ gauge networks. GCMs can provide real-time climatic data and forecasting information, but they usually exhibit considerable uncertainties because of large discrepancies between different general circulation models [6,7]. Reanalysis systems intend to merge background forecast models and data assimilation routines, and their performance may depend on the quality of assimilated datasets [1]. Remote sensing retrievals are based on the observations of sensors onboard satellite and related precipitation inversion formulas. Satellite sensors are currently the only instruments that can provide global, homogeneous precipitation estimates [1].
In recent years, satellite-based precipitation products (some assimilated in reanalysis systems) have reached a good level of maturity and have been widely applied in hydrology and water resources-related issues [8]. Various precipitation products with different temporal and spatial resolutions are available [9][10][11][12][13][14][15][16][17]. However, satellite-based precipitation products are subject to a variety of uncertainties associated with sensor accuracy, revisit time gaps, spatial resolution, relationships between remotely sensed signals and rainfall rate, and atmospheric effects [18].
Many studies devoted to evaluating precipitation products at regional and global scales have been conducted [19][20][21][22][23][24][25][26][27]. Among these studies, tropical rainfall measuring mission multi-satellite precipitation analysis (TMPA) show a widely range of usability. It has been shown to perform well at the monthly scale precipitation for various regions, including mainland China [28], coastal and island sites in China [11], and India [2,23]. However, TMPA has limitations in capturing daily precipitation and tends to overestimate rain event durations and underestimate rain event separations [29,30]. Some other satellite-based precipitation products exhibit similar performance. Examples include applications of climate hazards group infrared precipitation with stations (CHIRPS) demonstrated in Cyprus [16] and India [31], the global precipitation climatology project (GPCP) in tropical regions [32], and precipitation estimation from remotely sensed information using artificial neural networks-climate data record (PERSIANN-CDR) in the United States [20]. Global satellite mapping of precipitation (GSMaP) and multi-source weighted-ensemble precipitation (MSWEP) have been shown to not perform as well as TMPA in India [2], whereas the climate prediction center morphing technique (CMORPH) product was found to exhibit better performance on a daily scale than TMPA in Meichuan [33].
Given that the performance of these precipitation products tends to vary widely for different regions, a thorough and specific evaluation is particularly necessary for the TP region [18,34,35]. However, very few such studies concentrated on the TP-a region with a very complex topography and high elevation. Two researchers revealed that TMPA 3B42 and CMORPH perform better for the TP than TMPA 3B42RT and PERSIANN [18,36]. Bai and Liu [34] indicated that MSWEP agree better with the gauge observations than the other four satellite-based products (CHIRPS, CMORPH, PERSIANN-CDR, Remote Sens. 2020, 12, 1750 3 of 22 and TMPA 3B42). However, these studies only focused on magnitude agreement (e.g., errors or biases in precipitation amounts) and did not consider occurrence consistency or elevation dependency.
In this study, we evaluated the performance of nine popular satellite-based and one gauge-based precipitation products against gauge observations from the China meteorological administration (CMA). The evaluation was conducted at grid-cell and watershed scales and addressed not only magnitude agreement but also occurrence consistency. Elevation dependency of estimation biases and the spatiotemporal distribution of precipitation over the TP were examined. The results of this study have useful implications for the application of satellite-based products for water resources assessment and hydrological modeling in the TP and for the improvement of satellite-based precipitation products.

Study Area
The TP, located in central Asia, has an area of approximately 2.6 million km 2 , and a mean elevation of more than 4000 m above sea level ( Figure 1). The study area is defined between 26.75-40.0 • N and 74.25-105.0 • E, covering the entire Tibet Autonomous region and some portions of provinces in China, including all of Qinghai, northern Yunnan, western Sichuan, southwestern Gansu, and southern Xinjiang.

Study Area
The TP, located in central Asia, has an area of approximately 2.6 million km 2 , and a mean elevation of more than 4000 m above sea level ( Figure 1). The study area is defined between 26.75-40.0°N and 74.25-105.0°E, covering the entire Tibet Autonomous region and some portions of provinces in China, including all of Qinghai, northern Yunnan, western Sichuan, southwestern Gansu, and southern Xinjiang.
The average temperature in the TP ranged from −6 °C to 20 °C over the past two decades, with a northwest-southeast increasing gradient. The annual precipitation ranged from 50 mm to 2000 mm, with strong spatiotemporal variability because of the complex effects of multiple monsoons and mountain blockages. Specifically, in the summer, the TP has abundant moisture supply contributed by mid-latitude westerlies, Indian summer monsoons, and East Asian summer monsoons covering its northwestern, southern, and eastern parts, respectively. These three moisture sources come mainly from the North Atlantic, the Arabian Sea and the Bay of Bengal, and the South China Sea and the western Pacific, respectively [37]. In the winter, the zonal orientation of the Himalayas blocks synoptic-scale exchanges of warm tropical air with cold polar air, the only avenue of air exchange being east of the Himalayas [18]. Thus, the south-eastern monsoons produce heavy precipitation in the summer months; while westerly winds bring winter precipitation [38].
In this work, the eastern TP (ETP) was specially extracted on the basis of the density of meteorological stations (i.e., CMA) in the TP (Figure 1) to better evaluate the precipitation products. The ETP is composed of source regions of seven watersheds, including Hexi, Yellow, Yangtze, Mekong, Salween, Qaidam, and Brahmaputra. The areas of the seven watersheds are between 0.06 million km 2 and 0.48 million km 2 . The watersheds of the Hexi, Yellow, and Qaidam Rivers locate in the northeastern TP, and their mean elevation is approximately 3600 m, while Yangtze, Mekong, Salween, and Brahmaputra Rivers in the southeastern TP with mean elevation of approximately 4400 m above sea level. The average temperature in the TP ranged from −6 • C to 20 • C over the past two decades, with a northwest-southeast increasing gradient. The annual precipitation ranged from 50 mm to 2000 mm, with strong spatiotemporal variability because of the complex effects of multiple monsoons and mountain blockages. Specifically, in the summer, the TP has abundant moisture supply contributed by mid-latitude westerlies, Indian summer monsoons, and East Asian summer monsoons covering its northwestern, southern, and eastern parts, respectively. These three moisture sources come mainly from the North Atlantic, the Arabian Sea and the Bay of Bengal, and the South China Sea and the western Pacific, respectively [37]. In the winter, the zonal orientation of the Himalayas blocks synoptic-scale exchanges of warm tropical air with cold polar air, the only avenue of air exchange being east of the Himalayas [18]. Thus, the south-eastern monsoons produce heavy precipitation in the summer months; while westerly winds bring winter precipitation [38].
In this work, the eastern TP (ETP) was specially extracted on the basis of the density of meteorological stations (i.e., CMA) in the TP (Figure 1) to better evaluate the precipitation products. The ETP is composed of source regions of seven watersheds, including Hexi, Yellow, Yangtze, Mekong, Salween, Qaidam, and Brahmaputra. The areas of the seven watersheds are between 0.06 million km 2 and 0.48 million km 2 . The watersheds of the Hexi, Yellow, and Qaidam Rivers locate in the northeastern TP, and their mean elevation is approximately 3600 m, while Yangtze, Mekong, Salween, and Brahmaputra Rivers in the southeastern TP with mean elevation of approximately 4400 m above sea level.

Data Sources
Satellite remote sensing is an invaluable tool for global measurements of atmospheric parameters at regular intervals [21]. Among the satellite-based data sources, infrared (IR) and microwave are considered as the major instruments to design for precipitation estimation. IR data can provide excellent spatiotemporal coverage but have an indirect relationship to precipitation. In contrast, MW observations give relatively accurate instantaneous rain rates but poor temporal sampling [1]. Thus, the concept of retrieving precipitation information from high-resolution satellite data is to take advantage of their complementary strengths, i.e., combining information from the more frequent IR data with the more accurate MW data [21]. Outgoing longwave radiation (OLR) can also provide nearly complete global coverage of large-scale precipitation estimation with high quality [39]. The microwave sounding unit (MSU) primarily focuses on oceanic precipitation monitor [40]. Based on these satellite remote sensing and other measurement techniques, numerous precipitation products with different spatiotemporal resolutions and coverages are freely available.
In this study, we focused on 10 precipitation products listed in Table 1, which can be divided into three groups based on their data sources. The first group relies exclusively on satellite data, including GSMaP-MVK/RNL V6, CMORPH-RAW V1.0, and PERSIANN-cloud classification system. The second group, which relies on satellite and gauge data combined, includes GPCP-1DD, PERSIANN-CDR V1R1, TMPA 3B42, CPC merged analysis of precipitation, CHIRPS V2.0, and MSWEP V2.0. The nine products mentioned above are abbreviated hereinafter as GSMaP, CMORPH, PER-CCS, GPCP, PER-CDR, TMPA, CMAP, CHIRPS and MSWEP, respectively, and are roughly referred to as satellite-based products or datasets in this study. It is worth noting that GSMaP-RNL V6 (for period from March 2000 to February 2014) and GSMaP-MVK V6 (for period from March 2014 to the present) use same algorithms, but GSMaP-RNL V6 uses the Japanese 55-year reanalysis (JRA-55) as ancillary data [41]. We grouped GSMaP-RNL V6 and GSMaP-MVK V6 to GSMaP as they have same algorithms with temporal continuity. The third group consists of the fully gauge-based dataset, CPC-Global, which is sourced from multiple networks, including the global telecommunication system (GTS), the cooperative observer network (COOP), national meteorological agencies (NMAs), and CMA [42,43]. CPC-Global is not a satellite-based product, and it incorporates approximately 120 CMA stations in TP. It was selected in this study because it can be considered as reference data for evaluating the nine satellite-based products after validation, which will be described in Sections 3.1-3.3. Given the different time durations addressed by the ten products, we focused on the study period between January 2003 and October 2015, which all of the products cover.
To evaluate the above 10 products, we collected 156 stations of precipitation observations from the CMA. The stations primarily locate in the ETP (Figure 1). Each of the stations has undergone quality control procedures to eliminate erroneous and heterogeneous assessment, with additional routines to identify potential outliers (e.g., precipitation values less than 0 mm) and stiffness values (consistent values for long time series) [18]. Please note the CMA gauge observations and the CPC-Global product also hold measurement/instrument errors from wind-induced errors, wetting loss, and trace precipitation [44]. The total measurement error in the TP is up to 50 mm per year [45]. But we have to take the gauge observations as standard reference to evaluate the other satellite-based products ( Table 1).

Evaluation Procedures and Performance Indicators
The performance of the 10 precipitation products was evaluated from three aspects: magnitude agreement, occurrence consistency and elevation dependency. The evaluation procedure is presented in Figure 2. The magnitude agreement represents the errors of precipitation magnitude and temporal consistency of a product, and it is calculated against the CMA data for monthly precipitation at the grid-cell scale and watershed scale. To conduct a reasonable evaluation of the products against the point-scale CMA observations, the CMA station observations were first interpolated into six different spatial resolutions according to the resolutions of the ten precipitation products. The interpolation method uses a linear regression model and considers both the elevation and inverse squared distance between the stations [58][59][60]. Then, an effective grid cell (EGC) was defined according to the spatial resolution of a product ( Figure 2 (right)). For spatial resolutions of 0.04 • or 0.05 • , an EGC is the grid cell containing no less one station; otherwise, it is defined as a gird cell with at least five stations around (0.75 • from the center). Therefore, as indicated in Figure 2, the number of EGCs varies from 16 (for CMAP with resolution of 2.5 • ) to 134 (for products with the resolution finer than 0.25 • ) due to the different spatial resolutions of the products. The interpolated CMA datasets with different spatial resolutions are collectively referred to hereinafter as CMA data. We assume that the precipitation estimates within the EGC of each precipitation product is comparable with the CMA data at the corresponding resolution.
The performance of the 10 products at grid-cell scale may highly depend on the density of CMA stations and the definition of the EGC. To remedy this issue, we further evaluated the magnitude agreement at watershed scale. For this evaluation, we calculated the average precipitation of all grid cells within a watershed and compared it with the average precipitation of all stations within the same watershed. The magnitude agreement was represented by three measures, i.e., the correlation coefficient (R), the root mean square error (RMSE), and the long-term percentage bias (PBias). Because of different numbers of EGCs for the 10 precipitation products, we used the following statistics of the three measures-their maxima, minima, medians, 75% percentiles (Q75), and 25% percentiles Remote Sens. 2020, 12, 1750 6 of 22 (Q25). The difference between the two percentiles (Q75-Q25) is a measure of the spread over the ETP, quantifying its bias variability. The three measures are defined as follows: where n is the number of months; G i and G denote individual monthly and mean values of CMA data, respectively; and S i and S denote individual monthly and mean values of precipitation products, respectively. Occurrence consistency represents the capability of capturing the occurrence of daily precipitation. In this study, we defined that precipitation occurs if an amount of precipitation is greater than the threshold of 1.0 mm/day [18]. This aspect of evaluation was conducted for daily precipitation at grid-cell scale based on the CMA precipitation data. Occurrence consistency can be characterized by the contingency table-based categorical statistics, consisting of three measures-hit event, missed event, and false event [61]. A hit event is defined as both the evaluated data and the CMA data report precipitation occurrence coincidently. A missed event is defined as the evaluated data reporting no precipitation occurrence when the CMA data report that precipitation did occur. A false event is defined as the evaluation data reporting that a precipitation occurred when the CMA data report that no such occurrence. The missed rate reflects how well the products capture true precipitation occurrence, while the false rate represents the extent to which products report precipitation erroneously or do not estimate the precipitation duration well.
Elevation dependency represents the sensitivity of precipitation estimation to complex terrain conditions, because satellite sensors generally have difficulty in detecting precipitation in areas with complex terrain [36,62]. Elevation dependency can be measured using the correlation coefficient (R) between the bias and associated elevation [36]. A higher R indicates that a product is less robust and more sensitive to the effect of a complex terrain. The bias is calculated between each precipitation dataset and the reference data (the mean value of CPC-Global and the CMA data). As shown in Figure  1, two typical elevation bands in ETP were selected for analysis based on the elevation variation and density of CMA stations. The two elevation bands are roughly parallel to latitudinal trends, as this direction exhibits high consistency with the spatial distribution of environmental conditions (e.g., Occurrence consistency represents the capability of capturing the occurrence of daily precipitation. In this study, we defined that precipitation occurs if an amount of precipitation is greater than the threshold of 1.0 mm/day [18]. This aspect of evaluation was conducted for daily precipitation at grid-cell scale based on the CMA precipitation data. Occurrence consistency can be characterized by the contingency table-based categorical statistics, consisting of three measures-hit event, missed event, and false event [61]. A hit event is defined as both the evaluated data and the CMA data report precipitation occurrence coincidently. A missed event is defined as the evaluated data reporting no precipitation occurrence when the CMA data report that precipitation did occur. A false event is defined as the evaluation data reporting that a precipitation occurred when the CMA data report that no such occurrence. The missed rate reflects how well the products capture true precipitation occurrence, while the false rate represents the extent to which products report precipitation erroneously or do not estimate the precipitation duration well. Elevation dependency represents the sensitivity of precipitation estimation to complex terrain conditions, because satellite sensors generally have difficulty in detecting precipitation in areas with complex terrain [36,62]. Elevation dependency can be measured using the correlation coefficient (R) between the bias and associated elevation [36]. A higher R indicates that a product is less robust and more sensitive to the effect of a complex terrain. The bias is calculated between each precipitation dataset and the reference data (the mean value of CPC-Global and the CMA data). As shown in Figure 1, two typical elevation bands in ETP were selected for analysis based on the elevation variation and density of CMA stations. The two elevation bands are roughly parallel to latitudinal trends, as this direction exhibits high consistency with the spatial distribution of environmental conditions (e.g., precipitation, and atmospheric moisture supply) [37].
Based on the three evaluation aspects and the consistency of the spatiotemporal distribution of precipitation in the ETP, we illustrated the temporal fluctuation of precipitation over the entire TP area. It is worth noting that the precipitation is defined as the sum of snow, rain, freezing rain, and hail. In addition, the CPC-Global product was first evaluated with respect to the magnitude agreement and the occurrence consistency and was then integrated with the CMA data at a resolution of 0.5 • to produce a new reference dataset for use in the elevation dependency evaluation (Figure 2). The aim of this integration was to remedy uncertainties associated with inadequate density of the CMA observations and the interpolation method.

Magnitude Agreement at Grid-Cell Scale
Magnitude agreement at grid-cell scale was evaluated based on the three previously mentioned measures (i.e., R, RMSE, and PBias). The three measures for the nine satellite-based products presented similar ranking patterns, and the products can be roughly divided into three groups ( Figure 3). Three products (CHIRPS, MSWEP, and TMPA) exhibited good performance, with high median R values (>0.9), low median RMSE values (<25 mm/mon) and low median PBias values (<20%); Five other products (CMAP, PER-CDR, GPCP, GSMaP, and CMORPH) presented moderate performance with high median R values (≥0.6), but relatively high median RMSE values (30~45 mm/mon) and relatively high median PBias values (>30%). PER-CCS revealed large discrepancies with the CMA data, with a median R value of 0.15, a median RMSE value of 87 mm/mon, and a median PBias value of 100%. It is worth noting that CMAP product given a large bias (as indicated by the median PBias value of 75%) but favorable performance according to the other two measures (a median R value of 0.9 and a median RMSE value of 40 mm/mon). However, the gauge-based CPC-Global exhibited the greatest agreement with the CMA data, with the highest median R value (0.9), and the lowest median RMSE (14 mm/mon) and PBias (8%) values.
The assessment of the magnitude agreement of the ten products revealed different spatial variabilities according to the three measures ( Figure 3). CPC-Global presented the lowest variability, as indicated by the small spreads (Q75-Q25) of the three measures. In contrast, PER-CCS exhibited the highest variability. We plotted the spatial distributions of the three measures for the 10 products (Figures S1-S3) and selected three products (MSWEP, PER-CDR, and PER-CCS) as examples, shown in Figure 4. The performance of each product varied from region to region. In general, monthly precipitation was more easily detected in the northeastern ETP than it in the southern ETP. For instance, MSWEP performed better in the northeastern ETP, as indicated by small RMSEs (~11.3 mm/mon) and PBiases (~10.3%) than in the southern ETP, with higher RMSEs (~35.0 mm/mon) and PBias (~40.1%). PER-CDR showed patterns similar to those of MSWEP. However, PER-CCS exhibited better performance for the southern ETP than for the other areas.  Figure 4. The performance of each product varied from region to region. In general, monthly precipitation was more easily detected in the northeastern ETP than it in the southern ETP. For instance, MSWEP performed better in the northeastern ETP, as indicated by small RMSEs (~11.3 mm/mon) and PBiases (~10.3%) than in the southern ETP, with higher RMSEs (~35.0 mm/mon) and PBias (~40.1%). PER-CDR showed patterns similar to those of MSWEP. However, PER-CCS exhibited better performance for the southern ETP than for the other areas.

Magnitude Agreement at Watershed Scale
We only employed the PBias measure to represent the performance of the 10 products at watershed scale, as it exhibited a pattern similar to those of R and RMSE. Figure 5 shows the mean annual cycle of monthly precipitation bias for the 10 products over the entire ETP and the seven watersheds. For the entire ETP, the CMAP, CHIRPS, TMPA, and MSWEP products presented good performance, with mean monthly biases of 18.9%, 19.7%, 22.5%, and 25.7%, respectively. The GPCP, PER-CDR, and GSMaP products exhibited moderate performance with mean monthly bias values , precipitation estimation from remotely sensed information using artificial neural networks-climate data record (PER-CDR), and precipitation estimation from remotely sensed information using artificial neural networks-cloud classification system (PER-CCS) products against CMA data.

Magnitude Agreement at Watershed Scale
We only employed the PBias measure to represent the performance of the 10 products at watershed scale, as it exhibited a pattern similar to those of R and RMSE. Figure 5 shows the mean annual cycle of monthly precipitation bias for the 10 products over the entire ETP and the seven watersheds. For the entire ETP, the CMAP, CHIRPS, TMPA, and MSWEP products presented good performance, with mean monthly biases of 18.9%, 19.7%, 22.5%, and 25.7%, respectively. The GPCP, PER-CDR, and GSMaP products exhibited moderate performance with mean monthly bias values between 30% and 50%, while CMORPH and PER-CCS products presented large differences against the CMA data, with bias values in excess of 100%. However, CPC-Global showed the best performance, with a mean monthly bias of 8.5%.
The performance of the 10 products over the entire ETP was similar to those over the seven watersheds, but the products generally exhibited better performance in watersheds located in northeastern ETP. For instance, MSWEP presented smaller bias (~18.6%) over four watersheds (i.e., Hexi, Yellow, Qaidam, and Yangtze), while the biases over the other three watersheds (i.e., Mekong, Salween, and Brahmaputra) were over 45%. GPCP exhibited a mean bias below 25% over three northeastern watersheds (Hexi, Yellow, and Qaidam), but biases in excess of 40% over the other four. The precipitation products perform best in Yellow among the seven watersheds, and especially, the three products of CHIRPS, MSWEP, and TMPA achieve the lowest bias in Yellow (~12.4%). In contrast, precipitation in Brahmaputra is not well captured by the nine satellite-based products due to the highest bias of 218.5%.
The performance of the products varied with the season. We calculated the ratio of the mean rainy season error (May-September) to the mean total error. This ratio was used to quantify the error contribution from the rainy season (Table S1). For the ETP, we noticed that the ratios for PER-CCS (5.1%), CMORPH (7.6%), CPC-Global (18.9%), CHIRPS (47.5%), GSMaP (61.4%), and CMAP (76.6%) were below 100%, indicating that their errors were primarily contributed by the non-rainy season (October-April). Note that the ratio for PER-CCS as the smallest among the 10 but that this does not indicate it performed well for the rainy season because its mean total error was quite high at 729.2%. However, the ratios for the other four products were approximately 100%.

Remote Sens. 2019, 11, x FOR PEER REVIEW 5 of 23
The performance of the products varied with the season. We calculated the ratio of the mean rainy season error (May-September) to the mean total error. This ratio was used to quantify the error contribution from the rainy season (Table S1). For the ETP, we noticed that the ratios for PER-CCS (5.1%), CMORPH (7.6%), CPC-Global (18.9%), CHIRPS (47.5%), GSMaP (61.4%), and CMAP (76.6%) were below 100%, indicating that their errors were primarily contributed by the non-rainy season (October-April). Note that the ratio for PER-CCS as the smallest among the 10 but that this does not indicate it performed well for the rainy season because its mean total error was quite high at 729.2%. However, the ratios for the other four products were approximately 100%.

Hit-Missed-False Events
Occurrence consistency was evaluated for daily precipitation, and the statistics for the three measures of hit-missed-false events were calculated at EGC scale. Figure 6 shows the statistics for each precipitation product. For hit rate, the median values of all products were higher than 60%, MSWEP exhibited the highest percentage of hit event, with a median value of 82.3%, PER-CCS presented the lowest value of 60.7% (Figure 6a). For missed rate, CMAP showed the lowest median

Hit-Missed-False Events
Occurrence consistency was evaluated for daily precipitation, and the statistics for the three measures of hit-missed-false events were calculated at EGC scale. Figure 6 shows the statistics for each precipitation product. For hit rate, the median values of all products were higher than 60%, MSWEP exhibited the highest percentage of hit event, with a median value of 82.3%, PER-CCS presented the lowest value of 60.7% (Figure 6a). For missed rate, CMAP showed the lowest median value of 3.7%, followed by MSWEP (6.8%) and PER-CDR (9.1%) products. Five products (PER-CCS, TMPA, GPCP, GSMaP, and CMORPH) also exhibited flaws in capturing precipitation occurrence, with median values between 10 and 15%, while CHIRPS presented the highest median value of 17.6% (Figure 6b). As to the false rate, CHIRPS product exhibited the lowest value of 7.0%, followed by MSWEP, TMPA, and GSMaP (10 to 11%). CMORPH, CMAP, PER-CDR, and GPCP also showed poor performance in this aspect, with percentages of false event between 15% and 22%. PER-CCS presented the highest percentage of false event (25.0%) (Figure 6c). The gauge-based CPC-Global exhibited excellent performance, with median percentages of hit, missed, and false events of 82.5%, 7.4%, and 9.8%, respectively.
According to the spreads (Q75-Q25), the 10 products showed similar variabilities with respect to hit event, but different degrees of variability with respect to missed and false event. In missed event, CHIRPS product presented the largest variability, and MSWEP had the lowest variability. In false event, CMORPH exhibited the largest variability, and CHIRPS had the lowest variability. Three products (i.e., MSWEP, TMPA, and GSMaP) can therefore be assumed to perform well in capturing precipitation occurrence, while two products (CMORPH and PER-CCS) presented higher degrees of variability, lower percentages of hit events, and higher percentages of missed and false events.
The spatial distribution of the three measures for the 10 products are illustrated in Figures S4-S6, and MSWEP, PER-CDR, and PER-CCS were selected as examples, as in Section 3.1 (Figure 7). The performance of the precipitation products varied in different regions. For instance, MSWEP exhibited better consistency with the CMA data in the southern ETP, with a hit event percentage of 85.0%, which dropped to 82.1% in the northeastern ETP. PER-CCS performed similarly, with the percentage of hit events dropping from 72.9% to 60.1%. The change is mainly due to the increase in the percentage of false events (from 14.7% to 27.5%). PER-CDR showed an even distribution over ETP for hit events, but the percentage of missed events dropped from 10.2% (northeastern ETP) to 6.0% (southern ETP), while the percentage of false events increased from 15.0% to 22.5%.
Remote Sens. 2019, 11, x FOR PEER REVIEW 6 of 23 products (i.e., MSWEP, TMPA, and GSMaP) can therefore be assumed to perform well in capturing precipitation occurrence, while two products (CMORPH and PER-CCS) presented higher degrees of variability, lower percentages of hit events, and higher percentages of missed and false events. The spatial distribution of the three measures for the 10 products are illustrated in Figure S4-S6, and MSWEP, PER-CDR, and PER-CCS were selected as examples, as in Section 3.1 (Figure 7). The performance of the precipitation products varied in different regions. For instance, MSWEP exhibited better consistency with the CMA data in the southern ETP, with a hit event percentage of 85.0%, which dropped to 82.1% in the northeastern ETP. PER-CCS performed similarly, with the percentage of hit events dropping from 72.9% to 60.1%. The change is mainly due to the increase in the percentage of false events (from 14.7% to 27.5%). PER-CDR showed an even distribution over ETP for hit events, but the percentage of missed events dropped from 10.2% (northeastern ETP) to 6.0% (southern ETP), while the percentage of false events increased from 15.0% to 22.5%.

Elevation Dependency of Bias
Elevation dependency is evaluated based on precipitation bias and elevation with respect to the two elevation bands as shown in Figure 1. The bias of each satellite-based product was calculated based on the new reference data (i.e., the mean value of CPC-Global and the interpolated CMA observations). Figure 8 presents the variation of the satellite-based products' biases and the corresponding elevation over the two bands, as well as the R value between the two. In band 1, the elevation is between 2000 m and 4000 m, and generally showed a decreasing trend with longitude. The biases of most products exhibited an inverse trend with the elevation. However, two products (PER-CCS and GSMaP) presented high biases in high-altitude area (Figure 8a). In band 2, the elevation and biases presented less significant trends, the elevation varies over a range of approximately 4000 m, and the biases of the products exhibited a variation of approximately 50% (Figure 8b).
All nine satellite-based products exhibited varying degrees of correlation between elevation and precipitation biases (Figure 8c). The biases of five products (GPCP, PER-CDR, PER-CCS, CHIRPS, and GSMaP) showed obvious dependence on elevation (R value > 0.6) in band 1. The bias of GPCP had the highest R value (0.78) with respect to elevation, followed by PER-CDR with a value of 0.75. In band 2, the biases of GPCP, PER-CDR, and PER-CCS also presented dependence on elevation with R values between 0.4 and 0.5. However, CHIRPS and GSMaP did not present an apparent correlation between precipitation bias and elevation in band 2, with R values of approximately 0.15. The bias of other four satellite-based products (MSWEP, TMPA, CMAP, and CMORPH) showed a weak dependence on elevation in both bands.

Elevation Dependency of Bias
Elevation dependency is evaluated based on precipitation bias and elevation with respect to the two elevation bands as shown in Figure 1. The bias of each satellite-based product was calculated based on the new reference data (i.e., the mean value of CPC-Global and the interpolated CMA observations). Figure 8 presents the variation of the satellite-based products' biases and the corresponding elevation over the two bands, as well as the R value between the two. In band 1, the elevation is between 2000 m and 4000 m, and generally showed a decreasing trend with longitude. The biases of most products exhibited an inverse trend with the elevation. However, two products (PER-CCS and GSMaP) presented high biases in high-altitude area (Figure 8a). In band 2, the elevation and biases presented less significant trends, the elevation varies over a range of approximately 4000 m, and the biases of the products exhibited a variation of approximately 50% (Figure 8b).
All nine satellite-based products exhibited varying degrees of correlation between elevation and precipitation biases (Figure 8c). The biases of five products (GPCP, PER-CDR, PER-CCS, CHIRPS, and GSMaP) showed obvious dependence on elevation (R value > 0.6) in band 1. The bias of GPCP had the highest R value (0.78) with respect to elevation, followed by PER-CDR with a value of 0.75. In band 2, the biases of GPCP, PER-CDR, and PER-CCS also presented dependence on elevation with R values between 0.4 and 0.5. However, CHIRPS and GSMaP did not present an apparent correlation between precipitation bias and elevation in band 2, with R values of approximately 0.15. The bias of other four satellite-based products (MSWEP, TMPA, CMAP, and CMORPH) showed a weak dependence on elevation in both bands.  Figure 9 shows annual precipitation information derived from interpolated CMA data (0.25°) over the ETP and from the ten precipitation products over the entire TP. The reference precipitation data (CPC-Global and CMA data) exhibited a southeast-northwest decreasing gradient over the ETP, ranging from 1200 mm/yr in the southeast to 100 mm/yr in the northwest. Among the nine satellitebased products, the distribution from three products (CHIRPS, MSWEP, and TMPA) agreed well with that of the reference data in terms of the annual spatial patterns. Four products (CMAP, PER-CDR, GPCP, and GSMaP) were able to approximate the large-scale spatial patterns of annual precipitation but showed differences in certain regions. CMAP only captured the approximate spatial distribution of precipitation for the low spatial resolution; PER-CDR and GPCP seemed to produce overestimations compared to the reference data for the southern ETP, and GSMaP product produced an underestimation for eastern ETP.

Spatiotemporal Distribution
However, CMORPH showed obvious overestimation over the eastern ETP, as well as massive areas of discrepancy over the entire TP that were not observed for the other products. PER-CCS presented a very different spatial pattern from that of the reference data, with the highest precipitation in the central TP (Figure 9). It is also worth noting that CPC-Global presented the highest precipitation in the southern TP, but only three products (CHIRPS, MSWEP, and TMPA) exhibited similar precipitation pattern.  Figure 9 shows annual precipitation information derived from interpolated CMA data (0.25 • ) over the ETP and from the ten precipitation products over the entire TP. The reference precipitation data (CPC-Global and CMA data) exhibited a southeast-northwest decreasing gradient over the ETP, ranging from 1200 mm/yr in the southeast to 100 mm/yr in the northwest. Among the nine satellite-based products, the distribution from three products (CHIRPS, MSWEP, and TMPA) agreed well with that of the reference data in terms of the annual spatial patterns. Four products (CMAP, PER-CDR, GPCP, and GSMaP) were able to approximate the large-scale spatial patterns of annual precipitation but showed differences in certain regions. CMAP only captured the approximate spatial distribution of precipitation for the low spatial resolution; PER-CDR and GPCP seemed to produce overestimations compared to the reference data for the southern ETP, and GSMaP product produced an underestimation for eastern ETP.

Spatiotemporal Distribution
However, CMORPH showed obvious overestimation over the eastern ETP, as well as massive areas of discrepancy over the entire TP that were not observed for the other products. PER-CCS presented a very different spatial pattern from that of the reference data, with the highest precipitation in the central TP ( Figure 9). It is also worth noting that CPC-Global presented the highest precipitation in the southern TP, but only three products (CHIRPS, MSWEP, and TMPA) exhibited similar precipitation pattern. Figure 10a shows the annual precipitation fluctuations, according to the reference data (CPC-Global and interpolated CMA data) and the nine satellite-based products over the ETP. It should be noted that the annual fluctuations in ETP are very small according to the reference data, and the two sets of reference data presented slight differences with mostly similar fluctuation patterns. Among the nine satellite-based products, six products (CHIRPS, CMAP, TMPA, GPCP, MSWEP, and PER-CDR) were able to capture the fluctuation of precipitation in the reference data but exhibit different degrees of underestimation or overestimation. The other three products (PER-CCS, CMORPH, and GSMaP) exhibited different degrees of fluctuation with respect to the reference data.   To make a general view of the performance of the nine satellite-based products, we classified their performance into three groups marked as Good, Moderate, and Poor. Table 2 summarizes the performance classification according to the three evaluation aspects (i.e., magnitude agreement, occurrence consistency, elevation dependency) and related standards. We can see that CHIRPS, MSWEP, and TMPA were marked as Good because of their good magnitude agreement. CMAP has Moderate performance at grid-sell scale but Good performance at the watershed scale. The other products present Moderate or Poor performance. Similar performance ranking is shown for the occurrence consistency, where CHIRPS is in the Moderate group due to its relatively high missed rate (Figure 6b). For the elevation dependency, PER-CDR, GPCP, and PER-CCS are in Poor group because they are sensitive to the effects of complex terrain (Figure 8), while the other products were not classified into a specific group because there may be unknown elevation dependency despite their low correlations. It should be noted that the classification is only based on the judgement standard listed in Table 2, and Poor performance does not mean the associated product is not applicable to hydrology and climate related studies.    Table 2. Summary of evaluation outcomes of nine satellite-based precipitation products from different aspects. The letters MA, OC, and ED are acronyms of magnitude agreement, occurrence consistency, and elevation dependency. The letters G, M, and P are acronyms of Good, Moderate, Poor.

MA-EGCs
MA-Watersheds OC ED According to the performance classification, four products (CHIRPS, MSWEP, TMPA, and CMAP) in Good group were used to explore the annual precipitation distribution over the entire TP (Figure 10b). The results show that, during the period of 2003-2015, the mean annual precipitation over the TP was approximately 440 mm/yr, with the highest precipitation in 2010 (~472 mm), and the lowest in 2006 (~392 mm).

Performance of the Satellite-Based Products
This study extensively evaluated satellite-based precipitation products with respect to magnitude agreement, occurrence consistency, elevation dependency and spatiotemporal distribution. All the nine satellite-based precipitation products exhibited differences in magnitude agreement against the reference data, but three products (CHIRPS, MSWEP, and TMPA) presented higher R values and lower RMSE and PBias values. Most products showed higher biases in the non-rainy season (October-April), which has complex types of precipitation (including rainfall, snowfall, and freezing rain), indicating the limitations of satellite-based products in solid precipitation or light rainfall situations. And these products achieved more favorable performance in the northeastern ETP (including Hexi, Yellow, and Qaidam) than for the southern ETP (including Mekong, Salween, and Brahmaputra), where the terrain is more complex and generally of higher elevation.
With respect to occurrence consistency, all products presented good performance, with median hit rates higher than 60%, and relatively uniform values for missed and false rates. However, CHIRPS showed the highest percentage of missed event (17.6%), reflective of defect in its ability to correct for undetected (missed) precipitation events [18]. Three products (CMAP, CMORPH, and PER-CCS) had high percentages with respect to false event, which may be due to their low spatial resolution or lack of gauge correction. Furthermore, the capability of precipitation products to capture the precipitation occurrence may also be impacted by the revisit cycle or orbit type of the sensors. For elevation dependency, GPCP, PER-CDR, and PER-CCS products showed high elevation dependency. While according to Figure 9, CHIRPS, MSWEP, and TMPA products well captured the elevation-affected precipitation in the southern TP. The biases of the other three products (CMAP, GSMaP, and CMORPH) have little correlation with the topography, but may be attributed to their free-gauge, free-elevation correction, or their low spatial resolution.
Based on the above discussion, we can see that three of the nine satellite-based products, CHIRPS, MSWEP, and TMPA performed the most consistently with respect to the reference data in all three evaluation aspects. CMAP also exhibited good performance in ETP, but it is limited by lower spatial resolution. Their favorable performance can be attributed to their effective integration of multiple data sources and gauge correction during the estimation process. CHIRPS estimates precipitation based on infrared Cold Cloud Duration observations, and merges multiple data, such as TMPA, CMORPH and global geosynchronous TIR archives, etc. [48]. The design philosophy of MSWEP is to optimally merge precipitation data sources available as a function of timescale and location, the data sources include ERA-Interim, JRA-55, CMORPH, TMPA, etc. [10]. TMPA includes data sources from two different types of satellite sensors (microwave and IR), and employs several additional inputs [51]. CMAP merges large-scale precipitation data, which can be divided into seven categories (i.e., gauge observations, infrared, outgoing longwave radiation, microwave sounding unit, microwave scattering and emission, and precipitation models) [52]. Other studies have confirmed the good performance of these three products. CHIRPS has shown good correlation with recorded precipitation in Cyprus [16], East Africa [47], and India [31]. MSWEP has been found to perform well for daily rainfall in India [63]. TMPA has presented good correlation with precipitation data at multiple temporal scales in India [23], China [26], and Africa [64]. According to Beck et al. [21] and Yin et al. [65], MSWEP and CHIRPS were strongly recommended for hydrological research because of their excellent performance.
The two products (PER-CCS and CMORPH) generally lose the capability of detecting the magnitude variation and the precipitation occurrence. PER-CCS is a solely IR-based product without gauge or elevation correction, its main aim is to retrieve high-resolution precipitation in near-real time. Therefore, PER-CCS does not perform as well as the other products [54]. Kai et al. [18] also noted the poor agreement of PER-CCS with gauge observations over the TP. Similar behavior for PER-CCS has been noted on a global scale [21]. Moreover, PER-CCS has been found to exhibit strong elevation dependency in other regions, such as the northwestern Mexico [13]. CMORPH is a primarily MW-based product, and it retrieves precipitation mainly based on scattering by ice aloft, with poor temporal sampling [18]. Beck et al. [10] demonstrated the weak temporal correlation between CMORPH and gauge data over the TP.

Implications for Precipitation Product Application
Accurate precipitation information plays an important role in hydrological simulations and assessment of water resources. For satellite-based precipitation products, in addition to the product performance, the spatiotemporal resolution and coverage achieved also need to be taken into account. Several products (e.g., CHIRPS, MSWEP, PER-CDR) have been shown to have comparable abilities in terms of streamflow simulations in various regions [21,34,66]. Based on our assessments of multiple products in characterizing precipitation conditions over the TP region, the CHIRPS product is preferable for hydrological simulations and water resource assessment. This product provides the best overall correlation with measured data and has a long temporal record (1981-present) with a comparatively high spatial resolution (0.05). For long-term climate change and water resource assessment, we also recommend MSWEP and CMAP products, as both products present good performance and long temporal records (~40 years). Furthermore, we confirmed that the performance of the satellite-based products varies with season, i.e., most errors are concentrated in the non-rainy season (October-April) with solid precipitation. Thus, the ability of satellite products to capture different types of precipitation (including snowfall, freezing rain) also merits attention, as considerable uncertainty may arise during the non-rainy season for a hydrological simulation.

Limitations
In this study, we used CMA precipitation data to evaluate the performance of the satellite retrievals. Uncertainties in the CMA data may arise from the station density, the spatial scale mismatch between satellite and gauges, and the interpolation method used to transform gauge data into gridded data [67]. And our study only covered nine satellite-based precipitation products and conducted the evaluations without considering the precipitation intensities.
However, we used another gauge-based product (i.e., CPC-Global) to mitigate these uncertainties, as CPC-Global product is sourced from multiple sets of gauge networks including CMA stations. Our main purposes are to assess the capabilities and limitations of satellite-based products and to characterize the temporal change of precipitation over the TP. Our results present that precipitation estimated by the CHIRPS, MSWEP, TMPA, and CMAP products can capture the reference observations well and that the mean annual precipitation over the TP was approximately 440 mm/yr during the period of 2003-2015. Furthermore, our work reveals the limitations of satellite-based products in complex terrain and light rainfall situations, and the importance of multiple data sources and of elevation correction. Therefore, uncertainties caused by reference data or other factors should not appreciably diminish the validity of the conclusions drawn from the results of this study.

Conclusions
In this study, 10 precipitation products were evaluated using CMA precipitation data: nine satellite-based products (CHIRPS V2.0, MSWEP V2.0, TMPA 3B42, CMAP, PERSIANN-CDR V1R1, GPCP-1DD, GSMaP-MVK/RNL V6, CMORPH-RAW V1.0, and PERSIANN-CCS), and one gauge-based product (CPC-Global). The products were evaluated with respect to magnitude agreement, occurrence consistency and elevation dependency. According to the evaluation results, annual precipitation fluctuations over the TP are reflected well by four high-quality products. Our results are summarized as follows: Ten precipitation products exhibited different degrees of magnitude agreement with the CMA data. Precipitation biases were mainly concentrated in the non-rainy season, and all precipitation products generally achieved more favorable performance in the northeastern watersheds (including Hexi, Yellow, and Qaidam) than in the southern watershed (including Mekong, Salween, and Brahmaputra) in ETP.
The precipitation products perform best in Yellow among the seven watersheds and exhibited large discrepancies against the reference data in Brahmaputra.
With respect to occurrence consistency, all products presented good performance with hit rates higher than 60%, and rates of missed and false event that were relatively uniform. According to results of the elevation dependency evaluation, three of the nine products (GPCP-1DD, PERSIANN-CDR, and PERSIANN-CCS) had large room for improvement in terms of elevation correction because of their high elevation dependency.
Among the nine satellite-based products, CHIRPS, MSWEP, and TMPA 3B42 generally presented the best performance regarding the three aspects, even in regions with complex topography, followed by CMAP product. However, two products (PERSIANN-CCS and CMORPH-RAW) showed large biases against the reference data (median RMSE ≥ 45 mm/mon).
There are slight annual fluctuations in ETP in terms of the reference data. According to the four products (CHIRPS, MSWEP, TMPA 3B42, and CMAP), the annual precipitation over the TP is approximately 440 mm/yr during the period from January 2003 to October 2015, with the highest precipitation in 2010 (~472 mm) and the lowest in 2006 (~392 mm).
We conclude that it is difficult to generate reliable precipitation products by relying solely on sensor retrieval, and it is considerably important to integrate with gauge station data and to consider elevation correction. However, satellite-based precipitation products are inevitably imbued with considerable uncertainties. So, which precipitation product should be the most suitable for hydrological modeling and water resource assessment for the TP? CHIRPS is highly recommended for this purpose, based on the results of our study, followed by MSWEP. TMPA 3B42 and CMAP are also good alternatives for relatively coarse-resolution applications. In addition, PERSIANN-CCS may have potential for short-term hydrological forecasting as it provides near real-time precipitation information. Besides their performance as indicated above, the spatiotemporal resolution and the coverage of the products should be considered for various applications.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-4292/12/11/ 1750/s1. Figure S1. Spatial distribution of R value from 10 precipitation products against CMA data; Figure S2. Spatial distribution of RMSE (mm/mon) value from 10 precipitation products against CMA data; Figure S3. Spatial distribution of PBias (%) value from 10 precipitation products against CMA data; Figure S4. Spatial distribution of hit event (%) from 10 precipitation products against CMA data; Figure S5. Spatial distribution of missed event (%) from 10 precipitation products against CMA data; Figure S6. Spatial distribution of false event (%) from 10 precipitation products against CMA data; Table S1. The ratio (%) of mean rainy season error to the total mean error for 10 precipitation products over the seven watersheds and entire ETP. Brahmaputra is abbreviation as Brahma. Color grids categories: green-mainly non-rainy season error; blue-mainly rainy season error; gray-uniform error.