Capacity of Satellite-Based and Reanalysis Precipitation Products in Detecting Long-Term Trends across Mainland China

: Despite numerous assessments of satellite-based and reanalysis precipitation across the globe, few studies have been conducted based on the precipitation linear trend (LT), particularly during daytime and nighttime, when there are di ﬀ erent precipitation mechanisms. Herein, we ﬁrst examine LTs for the whole day (LT wd ), daytime (LT d ), and nighttime (LT n ) over mainland China (MC) in 2003–2017, with sub-daily observations from a dense rain gauge network. For MC and ten Water Resources Regions (WRRs), annual and seasonal LT wd , LT d , and LT n were generally positive but with evident regional di ﬀ erences. Subsequently, annual and seasonal LTs derived from six satellite-based and six reanalysis popular precipitation products were evaluated using metrics of correlation coe ﬃ cient (CC), bias, root-mean-square-error (RMSE), and sign accuracy. Finally, metric-based optimal products (OPs) were identiﬁed for MC and each WRR. Values of each metric for annual and seasonal LT wd , LT d , or LT n di ﬀ er among products; meanwhile, for any single product, performance varied by season and time of day. Correspondingly, the metric-based OPs varied among regions and seasons, and between daytime and nighttime, but were mainly characterized by OPs of Tropical Rainfall Measuring Mission (TRMM) 3B42, ECMWF Reanalysis (ERA)-Interim, and Modern Era Reanalysis for Research and Applications (MERRA)-2. In particular, the CC-based (RMSE-based) OPs in southern and northern WRRs were generally TRMM3B42 and MERRA-2, respectively. These ﬁndings imply that to investigate precipitation change and obtain robust related conclusions using precipitation products, comprehensive evaluations are necessary, due to variation in performance within one year, one day and among regions for di ﬀ erent products. Additionally, our study facilitates a valuable reference for product users seeking reliable precipitation estimates to examine precipitation change across MC, and an insight (i.e., capacity in detecting LTs, including daytime and nighttime) for developers improving algorithms. For the potential users who focus on long-term precipitation changes across MC, this study provides necessary and detailed information about the existing popular precipitation products’ performances in detecting linear trends, which is fundamental to obtaining robust conclusions.

Before using these products, it is of paramount importance to determine the reliability of the precipitation products using dependable reference datasets, because the inherent uncertainties within these products would likely affect final results, adversely impacting confidence levels [42,[50][51][52][53][54][55][56]. In terms of a study's specific needs and goals, the satellite-based and reanalysis precipitation datasets have been widely evaluated at different spatio-temporal scales with a series of validation metrics (e.g., [6,50,[56][57][58][59][60][61][62][63][64][65][66][67][68]). For instance, Sun et al. [6] selected several continuous and categorical validation statistics combined with bias and error decomposition techniques to assess the performance of the PERSIANN-Climate Data Record (CDR) precipitation product in the Huai River Basin, China, and pointed out that the daily, monthly and annual performance of this product varied in accordance with obvious intra-annual cycles. Huang et al. [62] systematically assessed five satellite-based precipitation products (CMORPH, PERSIANN and TRMM3B41RT, TRMM3B42RT, and TRMM3B42) with observations at 2400 weather sites across China, and found that estimates generally captured the overall spatial-temporal variation of precipitation, especially for warm seasons and humid regions. Beck et al. [40] compared 22 gridded daily precipitation datasets across the globe during 2000-2016 with daily observations at 76086 gauges and hydrological modeling, and highlighted that there existed large differences in the accuracy of precipitation estimates and more attention should be paid for precipitation dataset selection in both research and operational applications. de Leeuw et al. [65] used the daily precipitation observations from England and Wales to evaluate the ERA-Interim products, and found that this dataset underestimated the observations on a daily scale, while it could capture the statistics of extreme precipitation events. Lorenz and Kunstamann [67] analyzed the hydrological cycle with three state-of-the-art reanalyses (ERA-Interim, MERRA-2, and CFSR), and demonstrated that large differences existed between the reanalyses and the observations. The previous evaluations have provided valuable information for the theoretical understanding and improvement of satellite-retrieved algorithms and reanalysis systems. Nonetheless, most were conducted using daily, monthly and annual reference precipitation data; thus, the information about the capacity of the satellite-based and reanalysis precipitation is scarce on a sub-daily scale, especially for China. In fact, there are evident differences in the mechanisms of precipitation within one day, which are closely related to thermodynamic and dynamic processes of water and energy fluxes [3,[69][70][71][72][73][74]. For example, results from Yu et al. [74] indicated that long-duration stratiform precipitation frequently occurred in the early morning during the warm season over central-eastern China, while the late afternoon experienced a higher frequency of short-duration convective precipitation. Therefore, evaluating the multi-source precipitation products with sub-daily observations (daytime and nighttime datasets at least) could provide more detailed information, e.g., flexibility for a precipitation product on sub-daily scale. This is very useful to further improve satellite-based algorithms and models/reanalysis systems from the perspective of sub-daily precipitation mechanisms, and even correct the precipitation products using the sub-daily rather than daily measurements. Additionally, sub-daily precipitation changes have become a hot topic in current research, and numerous studies have been conducted (e.g., [71,[75][76][77][78][79][80][81][82]). Cheng et al. [71] pointed out that on annual and seasonal scales (except during spring), the majority of meteorological station records  of Southwest China displayed downward trends for total, daytime, and nighttime precipitation. Lin et al. [80] analyzed characteristics of summer precipitation diurnal variations during 2001-2014 in the Hubei Province of China, and suggested that the diurnal variations existed obvious regional differences. Based on observational day and night precipitation during 1961-2005 across Xinjiang, China, Han et al. [81] concluded that the annual increasing trends of precipitation in the daytime and nighttime respectively accounted for 49% and 51% of the total increasing trend in annual precipitation. Liu et al. [82] found that with the CMORPH dataset during 2008-2014, both the daytime and nighttime precipitation were detected to increase in Remote Sens. 2020, 12, 2902 4 of 25 summer over the Qilian Mountains, China. Lenderink et al. [77] reported that hourly precipitation extremes have substantially increased in the last century over De Bilt, Netherlands, and Hong Kong, China. Thus, an issue arises-can the existing precipitation products capture the linear trends on a sub-daily scale based on different validation metrics? This question has been paid little attention (e.g., [83]), despite the basis to examine precipitation trends with these datasets. Thus, assessments regarding precipitation trends can provide fundamental information to select the reliable products for exploring precipitation changes, particularly for regions with limited or even no observations (e.g., West China in Figure 1).
Considering the gaps in the previous works of precipitation evaluations, we used China as an example to examine the multi-source precipitation products' capacity to detect precipitation linear trends during daytime and nighttime. Thus, the main objectives of this work were to (1) investigate the spatial distribution of precipitation changes using daily, daytime, and nighttime records from 2393 weather sites across China; (2) to quantify the performance of selected products (i.e., six satellite-based and six reanalysis datasets) in detecting precipitation trends on a sub-daily scale with different validation metrics (correlation coefficient, bias, root mean square error, and sign accuracy) through a comparison with gauge observations; and (3) to identify the metric-based optimal products at a sub-daily scale.

Observed Precipitation
To evaluate the capabilities of various products in capturing precipitation changes, sub-daily (i.e., daytime, P d , 0000-1200 UTC; and nighttime, P n , 1200-2400 UTC) accumulated precipitation data observed from 2003 to 2017 at 2481 weather sites across China (Figure 1), including basic, benchmark, and general meteorological stations, were collected from the China Meteorological Administration (CMA). Although both datasets had undergone a series of quality control measures and homogenization, e.g., outlier identification, internal consistency checks, and spatio-temporal consistency checks [84], there were still missing values within the records. Therefore, to maximize the observational information, we processed the datasets following the procedures described below. First, the number of daytime and nighttime values was computed for each year at each site. If the days with missing values for daytime or nighttime observations exceed 50 at a site, the site was removed. Secondly, for the remaining sites, the bilinear interpolation method was employed to fill the missing values with the observations at the two closest sites. There were 2393 sites remaining after this process (Figure 1), and the accumulative precipitation for a whole day (abbreviated as P wd ) was then obtained as the sum of P d and P n .  [83]), despite the basis to examine precipitation trends with these datasets. Thus, assessments regarding precipitation trends can provide fundamental information to select the reliable products for exploring precipitation changes, particularly for regions with limited or even no observations (e.g., West China in Figure 1). Considering the gaps in the previous works of precipitation evaluations, we used China as an example to examine the multi-source precipitation products' capacity to detect precipitation linear trends during daytime and nighttime. Thus, the main objectives of this work were to (1) investigate the spatial distribution of precipitation changes using daily, daytime, and nighttime records from 2393 weather sites across China; (2) to quantify the performance of selected products (i.e., six satellitebased and six reanalysis datasets) in detecting precipitation trends on a sub-daily scale with different validation metrics (correlation coefficient, bias, root mean square error, and sign accuracy) through a comparison with gauge observations; and (3) to identify the metric-based optimal products at a subdaily scale.

Observed Precipitation
To evaluate the capabilities of various products in capturing precipitation changes, sub-daily (i.e., daytime, Pd, 0000-1200 UTC; and nighttime, Pn, 1200-2400 UTC) accumulated precipitation data observed from 2003 to 2017 at 2481 weather sites across China (Figure 1), including basic, benchmark, and general meteorological stations, were collected from the China Meteorological Administration (CMA). Although both datasets had undergone a series of quality control measures and homogenization, e.g., outlier identification, internal consistency checks, and spatio-temporal consistency checks [84], there were still missing values within the records. Therefore, to maximize the observational information, we processed the datasets following the procedures described below. First, the number of daytime and nighttime values was computed for each year at each site. If the days with missing values for daytime or nighttime observations exceed 50 at a site, the site was removed. Secondly, for the remaining sites, the bilinear interpolation method was employed to fill the missing values with the observations at the two closest sites. There were 2393 sites remaining after this process (Figure 1), and the accumulative precipitation for a whole day (abbreviated as Pwd) was then obtained as the sum of Pd and Pn.   [85] Crosses and triangles correspond to 1 and more than 2 sites within a given grid, respectively, followed by the percentage of grid shown in the bracket.
China is located in a typical monsoon region (i.e., the East Asian monsoon region), with evident spatio-temporal variability of precipitation and the related mechanisms [86]. Mainland China (MC) is divided into ten Water Resources Regions (WRRs, Figure 1), which is beneficial for examining regional differences in the performance of each product to detect precipitation trends. We conducted annual and seasonal evaluations of P wd , P d , and P n trends during 2003-2017 on national and regional (i.e., MC and WRR, respectively) scales. Here, spring, summer, autumn, and winter were specified as March-May, June-August, September-November, and December-February, respectively.

Satellite-Based and Reanalysis Precipitation Datasets
In this study, considering the precipitation datasets availability and time span (study period of 2004-2017), we collected twelve sets of gridded precipitation data, including six satellite-based and six reanalysis products, for evaluation. Detailed information of these datasets is shown in Table 1. Of the selected satellite-based precipitation products, both the TRMM (i.e., TRMM3B42RT and TRMM3B42 adjusted with gauge observations), and the GSMaP (i.e., GSMaP-RNL and GSMaP-RNLG adjusted with gauge observations) precipitation datasets are produced through merging VIS/IR and MW information but are based on different algorithms [32,36]. In contrast, PERSIANN and PERSIANN-CCS belong to the VIS/IR family of satellite-based precipitation products [13,[33][34][35]87]. The main differences in the two PERSIANN products are that the PERSIANN-CCS system enables the categorization of cloud-patch features based on cloud height, areal extent, and variability of texture estimated from satellite imagery, which is optimized for observing extreme precipitation, particularly at a very high spatial resolution. The six reanalysis precipitation products include JRA-55, ERA-Interim, ERA-5, NCEP1, NCEP2, and MERRA-2. These reanalysis products are produced based on different forecasting systems by assimilating many of the basic surface and upper-atmospheric fields from multiple sources, e.g., the surface humidity, radiosonde-based specific humidity, wind fields, and satellite-derived radiance. Among them, different data assimilation techniques are employed. For example, the ERA-Interim, ERA-5, and JRA-55 adopt four-dimensional variational (4D-VAR) data assimilation systems, whereas the MERRA-2, NCEP1, and NECEP2 utilize 3D-VAR assimilation systems. For more details about these datasets, the reader can refer to the product-specific user guide and the related literature. As shown in Table 1, datasets had different temporal and spatial resolutions, so it is necessary to process them before evaluation. First, the satellite-based and reanalysis P d and P n were summed from the 1-hourly, 3-hourly, or 6-hourly accumulated precipitation at product-specified grids. Then, based on the bilinear interpolation method, the P d and P n for all products (except for TRMM-3B42RT, TRMM-3B42, PERSIANN, and ERA-5) were resampled to the spatial resolution of 0.25 • . This was mainly because most products correspond to a spatial resolution of 0.25 • or higher, so the resampling-induced uncertainties could be reduced to some extent. For P wd , its values were obtained using the sum of P d and P n from the resampled maps. The grids with at least one site were extracted to conduct performance evaluations. If any grid included more than one site, the average precipitation value at these sites was calculated to represent the final reference value of that grid.

Methodolody
The precipitation trends were calculated using where y is annual or seasonal accumulative precipitation; t refers to time; a represents the slope coefficient, namely, linear trend; and b is the constant. Pearson's correlation and the two-tailed Student's t test (i.e., p < 0.05) were applied to check for statistically significant relationships. Satellite-based and reanalysis precipitation trends were quantitatively assessed with the metrics of bias (B), which measured the trend differences between the products and the gauge observations; root mean square error (RMSE), which represented the overall accuracy of the trends derived from the products; the correlation coefficient (CC), which quantified the spatial consistency of the trends derived from the products; and accuracy of sign (AS), which examined the degree of agreement between the positive or negative sign of precipitation trends from the products and the observed data. These metrics were calculated using the following equations: where a P,i and a O,i represent the linear trends from a certain precipitation product and the gauge observation at the ith grid, respectively; N is the number of the used grids for evaluation across MC or each WRR; a P and a O represent the products and the observed trends averaged at the grids within MC or a certain WRR, respectively; and nP is the number of the grids, where the examined products shows the same sign of precipitation (e.g., P wd , P d or P n ) changes as the observed within a given region, but nG indicates the total number of grids in the region. Considering the co-variation of P d and P n , we defined a joint AS (JAS), which represented the capacity of a given product to rightly detect the signs of both P d and P n changes relative to the observed data. JAS can be calculated by where nP co is the number of the grids in which the signs of both P d and P n changes derived from the products are the same as those observed in a given region.  Figure 2(a1) depicts observed annual LTs for MC and ten WRRs during 2003-2017. For MC, annual LT wd and LT d were 8.42 mm/yr (p < 0.05) and 4.96 mm/yr (p < 0.05), respectively, followed by an insignificant LT n of 3.46 mm/yr. Comparing LT wd , LT d , or LT n (i.e., signs and magnitudes) among WRRs, there were evident regional differences, while significant (p < 0.05) and larger increases (>13 mm/yr, >8 mm/yr and >7 mm/yr for LT wd , LT d and LT n , respectively) were found in YZRB, SERB, and PRB, followed by the largest reductions (-13.95 mm/yr for LT wd , −4.68 mm/yr for LT d and −9.27 mm/yr for LT n ) in HuRB. In spring (Figure 2(b1)), LTs for WRRs and MC were between −4 mm/yr and 4 mm/yr, with the exceptions of SERB and PRB, which showed LT wd > 8 mm/yr, and LT d and LT n > 4 mm/yr. During summer, MC LT wd and LT d (LT n ) were positive (negative) with a rate <2 mm/yr ( Figure 2(c1)). Among ten WRRs, most exhibited smaller LT wd (LT d and LT n ) in summer, generally corresponding to between −3 mm/yr and 4 mm/yr (between −1.50 mm/yr and 2.50 mm/yr); however, significant (p < 0.05) decreasing and increasing LTs were detected over HuRB and YZRB (excluding LT n ) and SERB, and the LT wd , LT d , and LT n were >6 mm/yr and > 4 mm/yr, respectively. As shown in Figure 2(d1), MC LT wd and LT d (LT n ) were 5.60 mm/yr (p < 0.05) and 2.80 mm/yr (p < 0.05), respectively. Except for two WRRs (i.e., HaRB and YRB), autumn LT wd , LT d , and LT n were consistently positive from 2003 to 2017. However, magnitudes of autumn LTs differed among these WRRs, for which significant (p < 0.05) and larger increases (>8 mm/yr for LT wd and 3.20 mm/yr for LT d and LT n ) occurred in YZRB, SERB, and PRB. Regarding winter precipitation (Figure 2(e1)), SERB and PRB exhibited the highest LT wd (>2.80 mm/yr) and LT d and LT n > 0.90 mm/yr, followed by the remaining WRRs and MC with an LT wd < 1.50 mm/yr (LT d and LT n < 0.70 mm/yr). Additionally, comparing signs and magnitudes of LT d and LT n ( Figure 2(a1-e1)), 10 and 15 of 55 cases (i.e., 11 (MC + 10 WRRs) × 5 (annual + seasonal scales)) showed opposite signs and larger differences, with ratios between LT d and LT n > 2.00 and < 0.50, respectively. These findings imply that LT d and LT n values were not consistent, possibly due to the different precipitating mechanisms during daytime and nighttime, and thus further confirms the necessity to evaluate various precipitation products at a sub-daily scale.

Gauge Precipitation Changes across MC
As shown in Table 2 and Figure 2(a2), 33% of grids had decreasing annual LT wd across MC, generally in east LRB, HuRB, the YRB-YZRB border, and most of SWRB and NWRB. Moreover, 3% of grids in north-central HuRB showed significant (p < 0.05) negative annual LT wd with a rate of −12 mm/yr. In contrast, 11% of grids had significantly (p < 0.05) increasing LT wd , mainly situated in east SHRB, central YRB, northeast YZRB, north SERB, and middle PRB, for which LT wd over the three latter regions exceeded 20 mm/yr. For both annual LT d and LT n ( Table 2), negative values were found in > 30% of grids, followed by < 4% of grids with significant (p < 0.05) values. Moreover, in spite of smaller magnitudes of difference compared to annual LT wd , similar spatial distributions for LT d and LT n were detected ( Figure 2(a3,a4)). Figure 2(b2-b4,c2-c4,d2-d4,e2-e4) illustrate the spatial distribution of seasonal LT wd , LT d , and LT n during 2003-2017. In broad terms, LT wd , LT d , or LT n spatially differ during seasons, while in a given season, a generally similar spatial pattern is observed among LT wd , LT d , and LT n , including for locations with significant (p < 0.05) LTs. For example, spring LT wd , LT d , and LT n were negative at 30% of grids, primarily in NWR, SWRB, west YZRB, north HuRB, and HaRB ( Figure 2(b2-b4) and Table 2); moreover, 2% of grids with significant (p < 0.05) changes were sporadically distributed, and larger reductions (-6 mm/yr for LT wd , but -2 mm/yr for LT d and LT n ) were in south SWRB. At the remaining grids, 60% of grids with larger increases for LT wd (12 mm/yr), LT d and LT n (4 mm/yr) in spring for were mainly located in east YZRB, PRB, and SERB, and 5% of grids with significant (p < 0.05) changes were generally in SHRB-LRB, YRB-YZRB borderlands, and east PRB. As shown in Figure 2(c2-c4) and Table 2, 44% of grids with a negative summer LT wd , LT d , and LT n were generally situated in central SHRB, LRB, HuRB, YRB-YZRB borderlands, west YZRB, PRB, and north SWRB, and the largest and significant (p < 0.05) reductions (−10 mm/yr) in 5% of grids were concentrated in HuRB. Of the remaining grids (>50%), the largest (10 mm/yr) and significant (p < 0.05) summer LTs were detected in 4% of grids mainly in northeast YZRB and north SERB. In autumn (Figure 2(d2-d4) and Table 2), LT wd , LT d , and LT n were at least -6 mm/yr at 30% of grids in south LRB, YRB-HaRB-HuRB and YRB-YZRB borderlands, central NWRB, and central SWRB. Of the grids with increasing LTs, 14% of grids with large (10 mm/yr) and significant (p < 0.05) values were situated in central SHRB, central PRB, east YZRB, parts of middle YZRB (i.e., Sichuan basin), and SERB. During winter (Figure 2(e2-e4) and Table 2), there was an approximately equal balance of grids with negative and positive LT wd , LT d , or LT n , which was generally 4mm/yr or −4mm/yr at most grids; moreover, increasing LTs were widely distributed across east coastal WRRs, south SWR, and central YRB. Furthermore, 2% of grids with significant (p < 0.05) increases in winter precipitation were patchily distributed across MC.  (Figure 2(a1-e1)), 10 and 15 of 55 cases (i.e., 11 (MC + 10 WRRs) × 5 (annual + seasonal scales)) showed opposite signs and larger differences, with ratios between LTd and LTn > 2.00 and < 0.50, respectively. These findings imply that LTd and LTn values were not consistent, possibly due to the different precipitating mechanisms during daytime and nighttime, and thus further confirms the necessity to evaluate various precipitation products at a sub-daily scale.   (a1) and (b1-e1), respectively, in which stars represent significant changes with p < 0.05. (a2) and (b1-e2) show spatial distributions of annual and seasonal P wd trends across MC, respectively, with the green cross representing significant changes with p < 0.05. (a1-e3) and (a1-e4) are the same as (a1-e2), but for P d and P n trends, respectively.

Evaluation Using Correlation Coefficient Metric
The CCs of LTs for the products and the observed values are depicted in Figure 3(a1-a5). For annual LTs, the corresponding CCs for TRMM3B42RT, TRMM3B42, PERSIANN, PERSIANN-CCS, and MERRA-2 were generally >0.40, suggesting that spatial distributions of annual LTs across MC can be derived from these products (Figure 3(a1)), especially for TRMM3B42 and MERRA-2 with CCs around 0.80. Besides, ERA-Interim, with an annual CC < 0.40, exhibited limited capacity in detecting annual LTs in space. However, annual CCs for the remaining six products were all below 0.10 and some were even negative, which indicates that these products are not able to capture the spatial distribution of LTs across MC. Comparing CCs of annual LT d and LT n , CC-based performance for each precipitation product differed over daytime and nighttime, especially PERSIANN-CCS and ERA-Interim, followed by TRMM3B42RT and PERSIANN. In spring (Figure 3(a2)), GSMaP-RNL, GSMaP-RNLG, JRA-55, ERA-55, NCEP1, and NCEP2 had negative CCs and therefore no ability to reflect the spatial distribution of LTs; however, the other products, with CCs > 0.40, had good performances, of which TRMM3B42 showed the best performances (CCs around 0.80) and the next was in TRMM3B42RT, PERSIANN, and MERRA-2 (CCs around 0.70). Furthermore, the spring CC-based performance of PERSIANN-CCS exhibited differences > 0.10 between daytime and nighttime. During summer (Figure 3a3), TRMM3B42 with CCs around 0.80 showed the best performance, followed by TRMM3B42RT and MERRA-2 (CCs around 0.70), ERA-Interim (CCs around 0.60), and PERSIANN and PERSIANN-CCS (CCs around 0.50). JRA-55, EAR-55, NCEP1, and NCEP2 with CCs < 0 indicated poor performance. Relative to spring, the capacity of GSMaP-RNL and GSMaP-RNLG to reproduce LTs in space increased in summer but was still limited, with CCs < 0.20. PERSIANN-CCS, GSMaP-RNL, and GSMaP-RNLG showed the greatest differences (>0.10) in summer CCs between daytime and nighttime. In autumn (Figure 3a4), the largest CCs (>0.80) were detected by TRMM3B42 and MERRA-2, while TRMM3B42RT, PERSIANN, and ERA-Interim had CCs ranging from 0.60 to 0.80. PERSIANN-CCS, JRA-55, and ERA-4 had CCs around 0.40 and could capture summer LTs spatially, while the remaining four products showed limited CC-based performance (CCs generally < 0.10). Comparisons of CCs for autumn LT d and LT n indicated that larger differences (>0.10) existed in PERSIANN, PERSIAN-CCS, JRA-55, and ERA-5, especially for the former three products with differences exceeding 0.20. Regarding winter CCs (Figure 3(a5)), eight of the products had values below 0.20 or 0, indicating that they had limited or no ability to capture winter LTs in space. Of the remaining products, the best product based on CC in winter was MERRA-2 (CCs around 0.90), followed by TRMM3B42 (CCs around 0.70), TRMM3B42RT (CCs around 0.50), and ERA-Interim (CCs < 0.40); no significant differences in CCs for LT d and LT n existed among these products.
To identify the CC-based optimal products (OPs) of LT wd , LT d , and LT n , we compared CCs from the 12 examined products. The results are depicted in Figure 3(b1-b5). For MC, the annual, spring, summer, and autumn (excluding LT d ) CC-based OP for the three LTs was TRMM3B42, and the winter OP was MERRA-2. For annual cases (including the three LTs and ten WRRs), the CC-based OP for 17 of the 30 cases was MERRA-2, generally in northern WRRs, while 11 cases, including LTs for southern WRRs (excluding YZRB) and LT d for LRB, HaRB, and YRB had an OP of TRMM3B42. In spring, the OP for more than ten cases was TRMM3B42, generally in southern WRRs, while 15 cases with the OP of MERRA-2 were in northern WRRs. With several exceptions (e.g., SHRB, HuRB, and NWRB) showing the summer OP of MERRA-2, TRMM3B42 was the OP in 16 cases. In winter, the OP for the overwhelming majority (27) of cases was MERRA-2, followed by three cases with ERA-Interim in SERB. Notably, some cases had CCs below 0.40 for the identified OPs, e.g., for LT wd , LT d , and LT n in LRB and NWRB; this indicates that using the so-called CC-based OPs to represent spatial distribution of precipitation trends needs more caution in certain regions.
OP for more than ten cases was TRMM3B42, generally in southern WRRs, while 15 cases with the OP of MERRA-2 were in northern WRRs. With several exceptions (e.g., SHRB, HuRB, and NWRB) showing the summer OP of MERRA-2, TRMM3B42 was the OP in 16 cases. In winter, the OP for the overwhelming majority (27) of cases was MERRA-2, followed by three cases with ERA-Interim in SERB. Notably, some cases had CCs below 0.40 for the identified OPs, e.g., for LTwd, LTd, and LTn in LRB and NWRB; this indicates that using the so-called CC-based OPs to represent spatial distribution of precipitation trends needs more caution in certain regions. Figure 3. Correlation coefficients (CCs) for LTs from the selected 12 precipitation products (a1-a5), CC-based optimal products (OPs) for MC and ten WRRs (b1-b5), and number of cases corresponding to OPs for an annual or seasonal scale in ten WRRs (c1-c5). In figures (b1-b5), the number of each box represents the CC of the identified OP, which has been labelled with different colors. The number of figures (c1-c5) indicates the amount of a certain OP. . Correlation coefficients (CCs) for LTs from the selected 12 precipitation products (a1-a5), CC-based optimal products (OPs) for MC and ten WRRs (b1-b5), and number of cases corresponding to OPs for an annual or seasonal scale in ten WRRs (c1-c5). In figures (b1-b5), the number of each box represents the CC of the identified OP, which has been labelled with different colors. The number of figures (c1-c5) indicates the amount of a certain OP.

Evaluation Using Bias Metric
Figure 4(a1-a5,b1-b5) depict the percentage of grids with negative and positive Bs of annual and seasonal LTs across MC, respectively. For simplicity, we focused on analyses regarding negative Bs in this paragraph. More than 50% of grids had negative annual Bs for PERSIANN, PERSIANN-CCS, GSMaP-RNL, and GSMaP-RNLG products (Figure 4(a1)). In particular, PERSIANN LT wd and LT d had negative Bs in >65% of grids. With several exceptions (i.e., TRMM3B42, JRA-55, and MERRA-2 for LT d ; and NCEP1 for LT n ), annual Bs for the remaining products were negative in <50% of grids, and even TRMM3B42RT, ERA-Interim, and NCEP2 showed negative Bs in <35% of grids. As shown in Figure 4(a2), most products had negative spring Bs for LT wd and LT d in >50% of grids, especially for PERSIANN, ERA-Interim, and MERRA-2 with >65% of grids. However, six of the 12 products underestimated LT n in around 50% of grids in spring, followed by the other six products with overestimations in >50% of grids. Similar to spring, >50% of grids with negative Bs for summer LT wd and LT d were detected by most of the products, of which PERSIANN, GSMaP-RNL, ERA-Interim, and NCEP2 corresponded to >65% of grids (Figure 4(a3)). Except for PERSIANN, summer LT n was underestimated in <50% of grids by the products, particularly for JRA-55, ERA-Interim, and ERA-5 with a grid percentage < 25%. In autumn (Figure 4(a4)), despite several exceptions, >50% of grids had negative Bs for the three LTs, and the PERSIANN product had negative Bs in >70% of grids. Relative to autumn cases, the opposite happened during winter (Figure 4(a5)), i.e., percentages of grids with negative Bs for LT wd , LT d , and LT n being generally <50%, in particular for JRA-55 and ERA-Interim. In addition, based on percentages of grids with negative Bs for LT d and LT n (Figure 4(a1-a5)), differences generally exceeding 10% were identified on both annual and seasonal scales for most of products, particularly MERRA-2 and ERA-Interim with annual and summer differences around 40%, respectively. This suggests that, in terms of grid percentages corresponding to underestimated and overestimated precipitation LTs, the products' performance varies at daytime and nighttime. and LTd were detected by most of the products, of which PERSIANN, GSMaP-RNL, ERA-Interim, and NCEP2 corresponded to >65% of grids (Figure 4(a3)). Except for PERSIANN, summer LTn was underestimated in <50% of grids by the products, particularly for JRA-55, ERA-Interim, and ERA-5 with a grid percentage < 25%. In autumn (Figure 4(a4)), despite several exceptions, >50% of grids had negative Bs for the three LTs, and the PERSIANN product had negative Bs in >70% of grids. Relative to autumn cases, the opposite happened during winter (Figure 4(a5)), i.e., percentages of grids with negative Bs for LTwd, LTd, and LTn being generally <50%, in particular for JRA-55 and ERA-Interim. In addition, based on percentages of grids with negative Bs for LTd and LTn (Figure 4(a1-a5)), differences generally exceeding 10% were identified on both annual and seasonal scales for most of products, particularly MERRA-2 and ERA-Interim with annual and summer differences around 40%, respectively. This suggests that, in terms of grid percentages corresponding to underestimated and overestimated precipitation LTs, the products' performance varies at daytime and nighttime.  Taking MC as a whole, regional mean Bs for LT wd , LT d and LT n derived from each product were calculated and are shown in Figure 5(a1-a5). At the annual scale, five products exhibited positive Bs for LT wd , with a range from 0.39 mm/yr for TRMM3B42 to 10.10 mm/yr for NCEP2, while four products exhibited positive Bs for LT d , ranging from 1.44 mm/yr for TRMM3B42RT to 5.01 mm/yr for NCEP2. Negative Bs were found for the remaining products, of which the lowest values of -7.88 mm/yr and -4.98mm/yr for LT wd and LT d , respectively, were recorded for PERSIANN ( Figure 5(a1)). In contrast, seven products overestimated annual LT n , particularly ERA-Interim, NCEP2, and MERR-2 with Bs > 5 mm/yr, while the other products' Bs were all negative and generally <−3 mm/yr. Regarding the spring LTs ( Figure 5(a2)), TRMM3B42RT, TRMM3B42, and MERRA-2 had positive Bs < 1.40 mm/yr, except for LT d . However, negative spring Bs were found in the remaining products, ranging from −2.60 mm/yr (−1.61 mm/yr) for JRA-55 to −0.88 mm/yr (−0.51 mm/yr) for ERA-Interim for LT wd (LT d ), and from −1.52 mm/yr for NCEP1 to −0.31 mm/yr for PERSIANN-CCS for LT n . For summer LT wd and LT n (Figure 5(a3)), most of the products exhibited positive Bs, while LT d was generally underestimated by the products (excluding TRMM3B42RT and NCEP1). Despite that, summer Bs for LTs were generally from −2 mm/yr to 2 mm/yr, except for TRMM3B42RT LT wd and LT n , and MERRA-2 LT n with Bs > 2 mm/yr, and PERSIANN-CCS LT wd and LT d with Bs < −2 mm/yr. In autumn ( Figure 5(a4)), absolute values of Bs for LTs from TRMM3B42, TRMM3B42RT, and MERRA-2 were all < 0.60 mm/yr, but Bs were generally < −1 mm/yr for the remaining products, and even some were lower than −4 mm/yr (i.e., PERSIANN, and NCEP2 for LT wd ). In contrast, the majority of products overestimated winter LT wd , LT d , and LT n , and Bs were generally < 3 mm/yr, with exceptions of JRA-55, ERA-Interim, and ERA-5 having Bs > 3 mm/yr ( Figure 5(a5)). In terms of Bs for annual LT d and LT n ( Figure 5(a1)), there were differences for some products, i.e., TRMM3B42RT, PERSIANN-CCS, and ERA-Interim with large differences > 2 mm/yr, and TRMM3B42, JRA-55, ERA-5, NCEP1, and MERRA-2 showing different sign (positive/negative). Evident differences in Bs existed for some products in each season (Figure 3(a2-a5)); in summer there were eight products with different signs of Bs and four products with large differences (around ±1 mm/yr) but the same sign.
Regarding the spring LTs ( Figure 5(a2)), TRMM3B42RT, TRMM3B42, and MERRA-2 had positive Bs < 1.40 mm/yr, except for LTd. However, negative spring Bs were found in the remaining products, ranging from −2.60 mm/yr (−1.61 mm/yr) for JRA-55 to −0.88 mm/yr (−0.51 mm/yr) for ERA-Interim for LTwd (LTd), and from −1.52 mm/yr for NCEP1 to −0.31 mm/yr for PERSIANN-CCS for LTn. For summer LTwd and LTn ( Figure 5(a3)), most of the products exhibited positive Bs, while LTd was generally underestimated by the products (excluding TRMM3B42RT and NCEP1). Despite that, summer Bs for LTs were generally from −2 mm/yr to 2 mm/yr, except for TRMM3B42RT LTwd and LTn, and MERRA-2 LTn with Bs > 2 mm/yr, and PERSIANN-CCS LTwd and LTd with Bs < −2 mm/yr. In autumn (Figure 5(a4)), absolute values of Bs for LTs from TRMM3B42, TRMM3B42RT, and MERRA-2 were all < 0.60 mm/yr, but Bs were generally < −1 mm/yr for the remaining products, and even some were lower than −4 mm/yr (i.e., PERSIANN, and NCEP2 for LTwd). In contrast, the majority of products overestimated winter LTwd, LTd, and LTn, and Bs were generally < 3 mm/yr, with exceptions of JRA-55, ERA-Interim, and ERA-5 having Bs > 3 mm/yr ( Figure 5(a5)). In terms of Bs for annual LTd and LTn ( Figure 5(a1)), there were differences for some products, i.e., TRMM3B42RT, PERSIANN-CCS, and ERA-Interim with large differences > 2 mm/yr, and TRMM3B42, JRA-55, ERA-5, NCEP1, and MERRA-2 showing different sign (positive/negative). Evident differences in Bs existed for some products in each season (Figure 3(a2-a5)); in summer there were eight products with different signs of Bs and four products with large differences (around ±1 mm/yr) but the same sign. Figure 5. MC Bs derived from the selected 12 precipitation products (a1-a5), B-based optimal products (OPs) for MC and ten WRRs (b1-b5), and number of cases corresponding to B-based OPs on an annual or seasonal scale for ten WRRs (c1-c5). In figures (b1-b5), the number of each box represents grid percentage (%) of OP, which has been labelled with different colors. The number of figures (c1-c5) indicates the amount of a certain OP. Figure 5. MC Bs derived from the selected 12 precipitation products (a1-a5), B-based optimal products (OPs) for MC and ten WRRs (b1-b5), and number of cases corresponding to B-based OPs on an annual or seasonal scale for ten WRRs (c1-c5). In figures (b1-b5), the number of each box represents grid percentage (%) of OP, which has been labelled with different colors. The number of figures (c1-c5) indicates the amount of a certain OP.
Considering offset effects of positive and negative Bs within MC and each WRR, we calculated the percentage of grids with the minimum absolute B for each product, and B-based OPs were identified as the product with the largest grid percentage ( Figure 5(b1-b5,c1-c5)). Except for annual LT d and LT n and summer LT wd and LT d , for which the OP was TRMM3B42, the OP for all other LTs was MERRA-2 for MC ( Figure 5(b1-b5)). For annual cases of the ten WRRs ( Figure 5(c1)), the OPs were TRMM3B42 and MERRA-2 in 14 and 12 cases, respectively. Furthermore, the OP for most WRRs was MERRA-2 for annual LT wd , and TRMM3B42 for both annual LT d and LT n ( Figure 5(b1)). For B-based OPs of spring and autumn LT wd ( Figure 5(b2,b4)), the OP was MERRA-2 in most WRRs, while the OPs for LT d and LT n in southern and northern WRRs differed and were mainly TRMM3B42RT and MERRA-2, respectively. In summer ( Figure 5(b3)), there were differences in OP for the three LTs, i.e., most WRRs with OPs of TRMM3B42 and MERRA-2 for LT wd , TRMM3B42, MERRA-2, and ERA-Interim for LT d , and ERA-Interim and TRMM3B42RT for LT n . During winter ( Figure 5(b5)), with four exceptions, all cases had an OP of MERRA-2. Overall, more than 20 cases had MERRA-2 as their OP for spring, autumn, and winter, while TRMM3B42 had fewer than six cases. For summer, the OP was TRMM3B42 for nine cases, ERA-Interim for eight cases, and MERRA-2 for seven cases ( Figure 5(c2-c5)).

Evaluation Using Error Metric
The MC RMSEs for LT wd , LT d and LT n of each product are illustrated in Figure 6(a1-a5). For MC, TRMM3B42, TRMM3B42RT, PERSIANN, PERSIANN-CCS, and MERRA-2, RMSEs for annual LT wd , LT d and LT n were lowest (<20.00 mm/yr for LT wd ; <10.00 mm/yr for LT d and LT n ); this indicates that the accuracy of the five products, especially TRMM3B42, in detecting annual LTs is better. Except for PERSIANN-CCS and MERRA-2, the slightly smaller differences in annual RMSE for LT d and LT n from the remaining products suggests a comparable accuracy at daytime and nighttime. For LT wd , LT d , and LT n , the largest RMSEs for each product occurred in summer and the smallest occurred in winter, due to their larger and smaller portion of annual MC precipitation, respectively. In each season, the minimum RMSE of the three LTs generally came from TRMM3B42 and MERRA-2, however, larger RMSEs were frequently found for GSMaP-RNL, GSMaP-RNLG, JRA-55, ERA-5, NCEP1, and NCEP2. Comparing RMSEs of LT d and LT n for each product in each season, LT d for most of the products exhibited larger and smaller values in summer and the other three seasons, respectively; it should be noted that differences between LT d and LT n were not evident, excluding PERSIANN-CCS, which had an absolute difference > 2 mm/yr in winter.
Remote Sens. 2020, x, x FOR PEER REVIEW 15 of 26 Figure 6. MC root mean square error (RMSE) derived from the selected 12 precipitation products (a1-a5), RMSE-based optimal products (OPs) for MC and ten WRRs (b1-b5), and number of cases corresponding to RMSE-based OPs for annual or seasonal scale in ten WRRs (c1-c5). In figures (b1-b5), the number of each box represents RMSEs (mm/yr) of OP, which are labelled with different colors. The number of figures (c1-c5) indicates the amount of a certain OP. Figure 6(b1-b5) illustrate the RMSE-based OPs of LTs for MC and WRRs. In general, the MC RMSE-based Ops for the three LTs for annual, spring (excluding LTwd) and summer were TRMM3B42, while MERRA-2 was the autumn OP (excluding LTn) and winter OP. At the annual scale, Figure 6. MC root mean square error (RMSE) derived from the selected 12 precipitation products (a1-a5), RMSE-based optimal products (OPs) for MC and ten WRRs (b1-b5), and number of cases corresponding to RMSE-based OPs for annual or seasonal scale in ten WRRs (c1-c5). In figures (b1-b5), the number of each box represents RMSEs (mm/yr) of OP, which are labelled with different colors. The number of figures (c1-c5) indicates the amount of a certain OP. Figure 6(b1-b5) illustrate the RMSE-based OPs of LTs for MC and WRRs. In general, the MC RMSE-based Ops for the three LTs for annual, spring (excluding LT wd ) and summer were TRMM3B42, while MERRA-2 was the autumn OP (excluding LT n ) and winter OP. At the annual scale, 13 cases had an RMSE-based OP of TRMM3B42, generally in southern WRRs, YRB, and HuRB; the remaining four WRRs had MERRA-2 as their OP (Figure 6(b1,c1)). In spring, the RMSE-based OP for the three LTs in northern WRRs and LT wd in southern WRRs was MERRA-2, corresponding to 21 cases; eight cases had TRMM3B42 and PERSIANN as their OP, and these mainly appeared in LT d and LT n of southern WRRs (Figure 6(b2,c2)). During summer (Figure 6(b3,c3)), with three exceptions, all cases had TRMM3B42 (19 cases) and MERRA-2 (eight cases) as their OPs. In autumn, TRMM3B42 was the OP in seven cases mainly in southern WRRs (except for LT wd and LT d in YZRB and SERB), followed by MERRA-2, which was the OP in 21 cases (Figure 6(b4,c4)). With the exceptions of LT wd and LT d in YZRB and LT wd in SWRB, MERRA-2 was the OP in 27 cases in winter (Figure 6(b5,c5).

Evaluation Using Metric of Sign Accuracy
To examine the degree of agreement between the positive or negative sign of LTs from the products and the observed values, metrics of AS and JAS were computed over MC and are illustrated in Figure 7(a1-a5) and Figure 8(a1-a5), respectively. At the annual scale, MC AS values for LT wd , LT d , and LT n from all products were > 50%. This suggests that the observed signs of LTs can be captured by the products, among which TRMM3B42RT, TRMM3B42, and MERRA-2 showed AS values > 70% for the three LTs, followed by PERSIANN, PERSIANN-CCS, ERA-Interim, and NCEP2 with values > 60% (Figure 7(a1)). During each season (Figure 7(a2-a5)), TRMM3B42RT, TRMM3B42, PERSIANN, PERSIANN-CCS, ERA-Interim, and MERRA-2 showed MC AS values > 60% for the three LTs, and the largest percentage (>70%) was found for TRMM3B42RT (except in winter), TRMM3B42 (except in winter), and MERRA-2. For the remaining six products, their AS-based performances differed among seasons. For example, all of them corresponded to autumn AS values > 50% for the three LTs; however, the values in the other seasons were generally < 50%. As shown in Figure 7(b1-b5), MC annual and summer AS-based OPs were MERRA-2 for LT wd , but TRMM3B42 for LT d and LT n . For MC LTs in the remaining three seasons, the AS-based OP was MERRA-2, except for spring LT d . Of the 30 annual cases in ten WRRs, AS-based OPs were MERRA-2 in 13 cases, TRMM3B42 in six cases and TRMM3B42RT in three cases, and there was more than one OP in five cases (Figure 7(b1-c1)). Among the ten WRRs, there were five or more OPs for each of the three LTs, indicating obvious regional differences for the products in detecting the same signs of LTs. In spring (Figure 7(b2-c2)), the AS-based OPs were MERRA-2 (15 cases), TRMM3B42 (five cases), TRMM3B42RT (four cases), ERA-Interim (four cases), and PERSIANN (two cases). Southern WRRs generally had OPs of TRMM3B42, TRMM3B42RT, and PERSIANN, while the OPs for northern WRRs were MERRA-2 and ERA-Interim. During summer (Figure 7(b3-c3)), AS-based OP was TRMM3B42 in 14 cases, mainly in southern WRRs, HaRB, YRB, and HuRB; and the OP was MERRA-2 in nine cases primarily in SHRB, LRB and NWRB. Of the 30 cases in autumn (Figure 7(b4-c4)), MERRA-2 was identified as the AS-based OP in 14 cases mainly in northern WRRs (excluding YRB), however, in eight cases the OP was TRMM3B42 generally in YRB, YZRB, and SWRB. Each of the six autumn cases in SERB and PRB had more than one OP. Regarding the 30 cases in winter (Figure 7(b5-c5)), 26 cases had AS-based OPs of MERRA-2 (22 cases) and TRMM3B42RT (four cases).
LRB and NWRB. Of the 30 cases in autumn (Figure 7(b4-c4)), MERRA-2 was identified as the ASbased OP in 14 cases mainly in northern WRRs (excluding YRB), however, in eight cases the OP was TRMM3B42 generally in YRB, YZRB, and SWRB. Each of the six autumn cases in SERB and PRB had more than one OP. Regarding the 30 cases in winter (Figure 7(b5-c5)), 26 cases had AS-based OPs of MERRA-2 (22 cases) and TRMM3B42RT (four cases).  Except for TRMM3B42RT, TRMM3B42, and MERRA-2 with annual JAS values > 55% (Figure 8(a1)), values of this metric were all below 50%, suggesting that these products have limited capacity to detect the co-variations of daytime and nighttime precipitation, in spite of relatively large AS for LTs (Figure 7(a1)). In spring (Figure 8(a2)), the best JAS-based performance was found in TRMM3B42 and MERRA-2 (with JAS around 60%), followed by TRMM3B42RT and ERA-Interim. Excluding PERSIANN and PERSIANN-CCS, the other six products had spring JAS values < 25%. JRA-55 and ERA-5 had JAS < 20%, which indicated that those six products could not capture the co-variations of spring precipitation changes at daytime and nighttime. During summer (Figure 8(a3)), TRMM3B42RT, TRMM3B42, and MERRA-2 performed the best (with JAS values > 53%), followed by PERSIANN, PERSIANN-CCS, and ERA-Interim with the next best performance (with JAS around 45%), and the remaining products (with JAS around 30%). For autumn (Figure 8(a4)), seven of the products correctly detected the co-variations of daytime and nighttime precipitation changes in >50% grids (i.e., JAS > 50%), particularly for TRMM3B42RT, TRMM3B42, ERA-Interim, and MERRA-2, which had JAS values > 64%. GSMaP-RNL and NCEP1 had JAS values near to 40% and performed the worst. Regarding winter JAS (Figure 8(a5)), values > 50% only appeared for MERRA-2, and the minima (around 25%) were found in GSMaP-RNL, GSMaP-RNLG, and NCEP2. As depicted in Figure 8b, MC annual and summer JAS-based OP was TRMM3B42, but for the other seasons the OP was MERRA-2. Except for SHRB and NWRB, with MERRA-2 as their annual JAS-based OP, the remaining WRRs generally had TRMM3B42 as the OP (Figure 8b). In southern WRRs, most had OPs of TRMM3B42 and PERSIANN in spring and summer but MERRA-2 and PERSIANN in autumn and winter. By contrast, summer JAS-based OPs were MERRA-2 and TRMM3B42 in northern WRRs, while MERRA-2 was the OP in most northern WRRs.
which had JAS values > 64%. GSMaP-RNL and NCEP1 had JAS values near to 40% and performed the worst. Regarding winter JAS (Figure 8(a5)), values > 50% only appeared for MERRA-2, and the minima (around 25%) were found in GSMaP-RNL, GSMaP-RNLG, and NCEP2. As depicted in Figure  8b, MC annual and summer JAS-based OP was TRMM3B42, but for the other seasons the OP was MERRA-2. Except for SHRB and NWRB, with MERRA-2 as their annual JAS-based OP, the remaining WRRs generally had TRMM3B42 as the OP (Figure 8b). In southern WRRs, most had OPs of TRMM3B42 and PERSIANN in spring and summer but MERRA-2 and PERSIANN in autumn and winter. By contrast, summer JAS-based OPs were MERRA-2 and TRMM3B42 in northern WRRs, while MERRA-2 was the OP in most northern WRRs.  (6), indicating the capacity of a given product to rightly detect the signs of both P d and P n changes relative to the observed data. In (b), the number in each box represents JAS values (%) of the OP, which has been labelled with different colors.

Possible Causes for Variation in Performance among Precipitation Products
In this study, we explored the reliability of the satellite-based and reanalysis products in capturing precipitation linear trends across MC, and found that the performances of these products exhibited clear differences. In general, TRMM3B42 and MERRA-2 showed the best overall performance. There are several possible explanations for the performance variation, e.g., input data, onboard sensors, and retrieval algorithm for the satellite-based products; and numerical models and their structures, parameterizations (especially for schemes about precipitation processes), and assimilation systems for the reanalysis products. Nonetheless, quantitatively identifying the impacts of these factors is difficult and beyond the scope of this study. As a result, we would like to discuss the potential causes of different performance among the precipitation products with the same retrieval algorithm or model structures, i.e., TRMM3B42RT vs. TRMM3B42, PERSIANN vs. PERSIANN-CCS, GSMaP-RNL vs. GSMaP-RNLG, ERA-Interim vs. ERA-5, and NCEP1 vs. NCEP2. It is evident that TRMM3B42 generally outperforms TRMM3B42RT, could be attributed to the fact that the former incorporates rain gauge data (i.e., monthly GPCP and CAMS data; [32]) to adjust the precipitation estimates. In some WRRs, TRMM3B42RT performed better or was the OP, implying that the TRMM3B42 precipitation trends were occasionally overcorrected due to an inappropriate correction method (e.g., daily TRMM3B42RT adjusted with monthly observation; [32]). For PERSIANN and PERSIANN-CCS, their major differences are that the latter includes a cloud classification system based on cloud height, areal extent, and variability of texture estimated from satellite imagery to more accurately describe the relationship between precipitation rate and brightness temperature [35]. Despite that, the performance of PERSIANN in detecting precipitation trends was better than PERSIANN-CCS across MC based on most of the validation metrics. This indicates that the cloud classification system within PERSIANN-CCS has limited effectiveness in improving the estimated precipitation trends, although PERSIANN-CCS has been found to outperform PERSIANN in estimating precipitation amount over some regions of MC and its sub-regions (e.g., Tibetan Plateau and Yangtze River Basin, [88][89][90][91]). For the bias metric, PERSIANN-CCS performed better, mainly because, within a given region, the functions between precipitation rate and brightness temperature are established for each categorization of cloud-patch, and thus the regional biases are more likely to be offset. Relative to other satellite-based products, the two GSMaP products had the worst performance in MC and ten WRRs, indicating that the algorithm employed by GSMaP-RNL and GSMaP-RNLG may be problematic in capturing precipitation trends. Meanwhile, some studies also found that the GSMaP products had very low performance in capturing precipitation magnitudes and hydrological modeling over MC [92] and some Asian regions, such as the VuGia-ThuBon River Basin of Vietnam [93], and Mekong River Basin [94]. Moreover, the worst performance of GSMaP-RNLG in terms of specific validation metric suggests that its gauge-based correction processes are not efficient to adjust the precipitation trend. Relative to ERA-Interim, ERA-5 had a more advanced assimilation system and more and newer observational inputs, and thus was observed to have better performances (e.g., lower bias and root-mean squared error, and higher correlation coefficient) to reproduce precipitation in some regions, [45,91,95]. However, we found that in the study the ERA-5 precipitation trends poorly match the observations relative to the ERA-Interim. These findings are consistent with the findings of Nogueira across the globe [96], who pointed out that the trend of global-mean rainfall in ERA-Interim was closer to GPCP than ERA-5, and suggested that the possible causes were associated with the global energy budget [97,98]. Due to NCEP2 with new system components including simple precipitation assimilation over land surfaces for improved soil wetness [43], NCEP2 precipitation agreed more closely with gauge measurements than NCEP1 data in China [99], USA [100], and Central Equatorial Africa [101]; by contrast, comparisons between NCEP1 and NCEP2 in representing precipitation linear trends show that no obvious differences existed. This may be related to significant time-varying jumps in the late 2000s within NCEP2, mainly due to the changes in observing systems, such as the introduction of new data into the assimilation systems [102,103]; this is also the possible cause of poor performance for JRA-55 [101]. The validation metrics clearly show that, based on precipitation linear trends, MERRA-2 performed better than other reanalysis products and even satellite-based precipitation products in MC. The better performances of MEERA-2 for representing precipitation amount were also found in other regions (e.g., Nepal, and the Pamir region of Tajikistan, [104,105]). Some scholars pointed that the possible causes are related to the advanced data assimilation technique within MERRA-2 and the bias corrections of MERRA-2 precipitation [45]. We should note that for a given product there are differences in performance of detecting precipitation trends within a day (e.g., daytime and nighttime annual correlation coefficients for ERA-Interim) and among seasons (e.g., smaller winter correlation coefficient values for PERSIANN but larger values in the other three seasons), mainly due to the different physical mechanisms controlling precipitation processes [69][70][71]73,74]. For example, some studies stated that sea-land breeze is closely associated with the diurnal cycle of precipitation in coastal areas, while topography and mountain-valley breeze plays an important role in the interior [73,74]. Therefore, to increase the accuracy of sub-daily and seasonal precipitation estimates, specific algorithms for the satellite-based products and specific model structures for the reanalysis products should be developed.

Uncertainties from Rain Gauge Data
We employed rain gauge data as a reference to validate the 12 precipitation products in detecting precipitation trends at different time scales across MC. It should be noted that the inherent uncertainties within the gauge data, which are related to flaws in calibration, wind-related under-catch, and wetting and evaporation losses, could bias the gauge measurements from the real values and weaken the robustness of the validation results (e.g., [106][107][108][109]). For example, Shedekar et al. [109] found that relative to the actual rainfall depths, the precipitation measurements from three calibrated tipping-bucket rain gauges were underestimations, particularly for heavy rainfall, and they highlighted that the biases were closely associated with the gauge calibrations. When it is windy, gauge observations are often impacted by wind-related under-catch effects through deflecting the flow and inducing eddies and turbulence around the gauges [108][109][110][111]. In general, wind can cause some raindrops, especially smaller ones, to miss the funnel or fall at an inclination, and finally impact the catch efficiency of the gauges. To what extent wind influences the accuracy of the gauge measurements is dependent on ambient wind speed, raindrop size distribution, and gauge design [110]. Sieck et al. [110] reported that, compared to rainfall from collocated buried gauges, wind-exposed aboveground gauges would likely observe about 2-10% less precipitation. Due to water adhering to the inside walls of the gauge and then evaporating, the gauge-recorded precipitation is generally lower than the true value, and the biases vary among gauge configurations (e.g., frequency of emptying) and precipitation types [106,112,113]. A Russian study revealed that, for each record of rainfall measurement, the mean average wetting loss was 0.2 mm, but for both snow and mixed precipitation the value was 0.15 mm [112]. Due to being exposed to the atmosphere, water within rain gauges is usually evaporated (i.e., evaporation losses; [114][115][116][117]). It is reported that evaporation losses for gauged precipitation generally range from 0.1 to 0.8 mm/day or from 0 to 1%; however, the magnitudes differ among gauge types, climate backgrounds, and seasons [114]. The combined effects induced by the aforementioned factors on rain gauge measurements are likely to underestimate the recorded precipitation [118], e.g., the bias-corrected annual precipitation (removing the uncertainties within raw observations) being 30-330 mm or 10-65% higher than the raw observations over Siberia.
Usually, the quality of precipitation observations is accompanied by an issue of standardization, or lack of, which is mainly due to changes in gauge instruments, station relocation and environment, etc. [119][120][121][122]. Moreover, these factors result in negative impacts on data quality, in particular for climate researches using long-term time series (e.g., linear trend evaluation in this study). Before using the gauged precipitation measurements, it is necessary to reduce and even remove the associated uncertainties, e.g., adjust the raw records using metadata about gauges, and at least eliminating sites with non-homogeneous measurements identified by some statistic methods. The Pettitt test has been used to remove sites with non-homogenous measurements due to a lack of metadata for the selected sites, but there is no guarantee that the remaining sites have no issues, which can weaken the confidence level of the results. Besides, mismatches between representatives of gauge precipitation (i.e., a point of space in time accumulation) and selected products (i.e., a snapshot of time in space aggregation) are likely to have an effect on the accuracy and precision of qualitative and quantitative assessments of various precipitation products [123][124][125]. For instance, the spatial resolution of all the 12 products is generally lower than 0.25 • × 0.25 • (except for PERSIANN-CCS), across which the estimated precipitation was averaged, while the spatial representation of a gauge is much smaller than the coverage of the pixel of the 12 products. Considering the variability of precipitation over a small spatial extent, a sparse gauge network may not identify meso-/micro-scale weather system-associated precipitation (e.g., convective precipitation; [126][127][128][129]); thus, gauge precipitation measurements may be smaller in magnitude and frequency than the ground-truthed values for a given pixel.

Conclusions
As important surrogate for precipitation estimates, various satellite-based and reanalysis precipitation products need to be validated from different perspectives. Especially, the information about the capacity of the satellite-based and reanalysis precipitation is scarce on a sub-daily scale, especially for China. However, the assessments regarding precipitation trends are fundamental for selecting the reliable products to explore precipitation changes, particularly for regions with limited or even no observations. Thus, with a motivation to explore twelve popular precipitation products (i.e., six satellite-based and six reanalysis products) in detecting precipitation linear trends across MC, we collected daytime and nighttime observations from a dense rain gauge network during 2003-2017, and examined LT wd , LT d , and LT n across mainland China. We found that annual and seasonal LT wd , LT d , and LT n for MC and most WRRs were positive but with regional differences. In terms of magnitude and sign (i.e., decreasing and increasing), LT d and LT n in a certain region showed evident differences, confirming the necessity to evaluate precipitation products at a sub-daily scale. Then, several statistical metrics (i.e., CC, B, RMSE, AS, and JAS) were employed to identify the differences and agreements of LTs for MC and ten WRRs between twelve precipitation products and gauge observations on sub-daily scale. In general, values of a given metric for annual and seasonal LT wd , LT d , or LT n differed among products. Meanwhile, performances for single product varied among seasons and between daytime and nighttime. At last, the metric-based OPs were identified for MC and each WRR. The metric-based OPs varied among regions and seasons, and between daytime and nighttime, but the most frequent OPs were TRMM3B42, ERA-Interim, and MERRA-2.
The comparison of satellite-based and reanalysis products in ability to detect precipitation linear trends in this study provides suggestions for developers and the potential users of these products across mainland China. For a given product, varying performance for different validation metrics at different timescales (between daytime and nighttime) suggests that the product's group can try to develop specific algorithms/models during a certain season (at a sub-daily scale) and correction procedures to improve its capacity to reproduce precipitation trends. For the potential users who focus on long-term precipitation changes across MC, this study provides necessary and detailed information about the existing popular precipitation products' performances in detecting linear trends, which is fundamental to obtaining robust conclusions.