Nine-Year Systematic Evaluation of the GPM and TRMM Precipitation Products in the Shuaishui River Basin in East-Central China

: Owing to their advantages of wide coverage and high spatiotemporal resolution, satellite precipitation products (SPPs) have been increasingly used as surrogates for traditional ground observations. In this study, we have evaluated the accuracy of the latest ﬁve GPM IMERG V6 and TRMM 3B42 V7 precipitation products across the monthly, daily, and hourly scale in the hilly Shuaishui River Basin in East-Central China. For evaluation, a total of four continuous and three categorical metrics have been calculated based on SPP estimates and historical rainfall records at 13 stations over a period of 9 years from 2009 to 2017. One-way analysis of variance (ANOVA) and multiple posterior comparison tests are used to assess the signiﬁcance of the di ﬀ erence in SPP rainfall estimates. Our evaluation results have revealed a wide-ranging performance among the SPPs in estimating rainfall at di ﬀ erent time scales. Firstly, two post-time SPPs (IMERG_F and 3B42) perform considerably better in estimating monthly rainfall. Secondly, with IMERG_F performing the best, the GPM products generally produce better daily rainfall estimates than the TRMM products. Thirdly, with their correlation coe ﬃ cients all falling below 0.6, neither GPM nor TRMM products could estimate hourly rainfall satisfactorily. In addition, topography tends to impose similar impact on the performance of SPPs across di ﬀ erent time scales, with more estimation deviations at high altitude. In general, the post-time IMERG_F product may be considered as a reliable data source of monthly or daily rainfall in the study region. E ﬀ ective bias-correction algorithms incorporating ground rainfall observations, however, are needed to further improve the hourly rainfall estimates of the SPPs to ensure the validity of their usage in real-world applications.


Introduction
Precipitation is an important component of the hydrological cycle [1,2]. Accurate and high-resolution precipitation data are crucial in different fields such as weather forecast, disaster preparation and prevention, and water resource management [3,4]. The quality and resolution of precipitation inputs can also significantly affect the performance of various hydrological, climatic, and atmospheric models [5].
However, obtaining suitable precipitation data could be challenging for researchers as well as practitioners. Availability of traditional ground observations has been limited because of the inadequate and uneven distribution of rain gauges, especially in developing countries, mountainous and remote areas, and over oceans [6]. On the other hand, although weather radar products can provide rainfall observations over a wide region [7], they are subjected to both random and systematic errors [8][9][10][11]. Random errors could arise from the sub-grid horizontal and vertical variability of rainfall and the noise of the radar hardware system, while systematic errors may originate from sources such as drifts in radar calibration constant, systematic variations in the reflectivity-rain-rate relationship, and strong gradients in the reflectivity profile [12]. The presence of complex topography may further amplify some of the error sources [13].
In recent years, with the rapid development of remote sensing techniques, satellite precipitation products (SPPs) have been increasingly applied in monitoring precipitation patterns [14][15][16]. Deriving precipitation products through satellite remote sensing has the advantages of wide coverage and high spatiotemporal resolution, which complement traditional ground gauge measurements. For example, the Tropical Rainfall Measuring Mission (TRMM) satellite launched in 1997 has been extensively used in hydrological modelling and climate change studies. Li et al. [17] found overall good linear relationships between TRMM and ground rainfall observations at both daily and monthly time steps in the Xinjiang catchment, China. Bharti and Singh [18] compared TRMM 3B42V7 with the gauge-based measurements at different altitudes in the northwest Himalayan region. They found that the satellite performed satisfactorily in the altitude range of 1000-2000 m, but poorly over higher-altitude regions at a daily time step.
In 2014, the National Aeronautics and Space Administration (NASA) of U.S. and JAXA (Japan Aerospace Exploration Agency) jointly developed a new generation of Global Precipitation Measurement (GPM) satellites. In addition to inheriting the advantages of the TRMM satellites in detecting precipitation in the tropics, GPM satellites provide global precipitation estimates for a wider quasi-global coverage (60 • N-60 • S) at a much higher spatiotemporal resolution (0.1 • × 0.1 • and 30-min interval). Much research has concluded that GPM products have improved in terms of both rainfall observation accuracy and hydrological simulation performance compared to TRMM products. For example, Tan et al. [19] and Sharifi et al. [20] compared the accuracy of rainfall observations between IMERG (integrated multi-satellite retrievals for GPM) and TRMM in Singapore and India, respectively. In both studies, all evaluation indices had indicated a better performance of IMERG than TRMM in providing monthly and daily rainfall data. In China, Tang et al. [21] analyzed the errors of IMERG and TRMM products in six sub-regions of Mainland China and found that IMERG had improved the accuracy of precipitation observations in the mid-high latitude as well as arid regions. In addition, they observed that IMERG could better reproduce the probability density function of rainfall, especially in the range of lower rainfall intensity.
In June 2019, the IMERG product was upgraded from Version 5 (V5) to Version 6 (V6) by reducing biases based on the new Global Precipitation Climatology Centre (GPCC) monthly precipitation records. Meanwhile, TRMM data before 2014 have also been reprocessed with the latest algorithm of the IMERG V6. Until now, few studies have been carried out to evaluate the performance of the latest IMERG and TRMM products.
Furthermore, the majority of previous research has evaluated SPP products at the monthly or daily scale, although both GPM and TRMM products contain hourly rainfall products. High quality hourly rainfall data have been found to be valuable to various hydrological applications around the world [22]. For example, Zhou and Wu [23] found that the precipitation intensity and distribution characteristics of typhoons in China could be better analyzed with hourly precipitation than daily observations at automatic weather stations. Yang et al. [24,25] found that the SWAT (Soil and Water Assessment Tool) model built on hourly rainfall could yield much better performance in simulating daily streamflow and monthly nutrient loads than the SWAT model built on daily rainfall in the Upper Huai River basin of China. Boithias et al. [26] found that the SWAT model built on hourly rainfall could better predict discharge over long periods of time than the MARINE model in the Mediterranean coastal Têt River basin (Southwestern France).
Compared to daily rainfall, hourly rainfall data are much more difficult to obtain because of several reasons. Firstly, much fewer gauges can or will record the amount of rainfall at an hourly or finer interval worldwide. Secondly, hourly rainfall data are generally not free to the public. Purchase of hourly rainfall data might be too expensive for researchers or practitioners in some regional studies. Finally, authorities in some regions may consider hourly rainfall records as sensitive data, thus denying their access to the public citing security reasons. In view of the limited access to hourly rainfall data globally, SPPs may provide a much-needed alternative for deriving such products. So far, few studies have been carried out to evaluate the capability of SPPs in providing hourly rainfall estimates.
To fill in the gaps, this study aims to evaluate the accuracy of the latest GPM and TRMM rainfall products across the monthly, daily, and hourly scales based on the ground rain gauge measurements between January 2009 and December 2017 in the Shuaishui River Basin (SRB) of eastern Central China. The Shuaishui River is the headwater tributary to the Qiantangjiang River, the main river flowing across the Zhejiang Province of China. With water quality inferior to the Class III standard at 50.5% of its total river length, the Qiantangjiang River Basin is faced with severe water security concerns [27,28]. As the critical ecological barrier to the Qiantangjiang River, the hydrological conditions of the SRB has direct impact on the downstream ecological environment.
Essentially a hilly watershed, SRB is characterized with complex terrains and obvious vertical height difference. Precipitation in the basin is abundant, but also highly seasonal. Steep slopes combined with ample rainfall in summer have aggravated the risk of natural disasters such as floods and mudslides [29]. The flood in June 2016 in the SRB, for instance, has affected 58,000 people with a direct economic loss of 168 million RMB. SRB, therefore, presents an ideal referencing region for evaluating the suitability of using SPPs in the sub-tropical hilly regions with large inter-annual and intra-annual rainfall variabilities.

Study Area
Approximately 159 km in length, the Shuaishui River originates from the Hutou mountain ranges and flows across the Xiuning County before pouring into the Xinanjiang River at the Tunxi district of Huangshan City. The SRB (117 • 39 -119 • 26 E and 29 • 24 -31 • 1 N) has a total area of 1522 km 2 ( Figure 1). Dominated by a hilly terrain, more than 70% of the basin is at an altitude of above 500 m. Land use and land cover in the basin is mainly forestland and cultivated land, which respectively accounts for 78.9% and 14.6% of the total coverage. Remote Sens. 2020, 12, x FOR PEER REVIEW 3 of 34 of hourly rainfall data might be too expensive for researchers or practitioners in some regional studies. Finally, authorities in some regions may consider hourly rainfall records as sensitive data, thus denying their access to the public citing security reasons. In view of the limited access to hourly rainfall data globally, SPPs may provide a much-needed alternative for deriving such products. So far, few studies have been carried out to evaluate the capability of SPPs in providing hourly rainfall estimates.
To fill in the gaps, this study aims to evaluate the accuracy of the latest GPM and TRMM rainfall products across the monthly, daily, and hourly scales based on the ground rain gauge measurements between January 2009 and December 2017 in the Shuaishui River Basin (SRB) of eastern Central China. The Shuaishui River is the headwater tributary to the Qiantangjiang River, the main river flowing across the Zhejiang Province of China. With water quality inferior to the Class III standard at 50.5% of its total river length, the Qiantangjiang River Basin is faced with severe water security concerns [27,28]. As the critical ecological barrier to the Qiantangjiang River, the hydrological conditions of the SRB has direct impact on the downstream ecological environment.
Essentially a hilly watershed, SRB is characterized with complex terrains and obvious vertical height difference. Precipitation in the basin is abundant, but also highly seasonal. Steep slopes combined with ample rainfall in summer have aggravated the risk of natural disasters such as floods and mudslides [29]. The flood in June 2016 in the SRB, for instance, has affected 58,000 people with a direct economic loss of 168 million RMB. SRB, therefore, presents an ideal referencing region for evaluating the suitability of using SPPs in the sub-tropical hilly regions with large inter-annual and intra-annual rainfall variabilities.

Study Area
Approximately 159 km in length, the Shuaishui River originates from the Hutou mountain ranges and flows across the Xiuning County before pouring into the Xinanjiang River at the Tunxi district of Huangshan City. The SRB (117°39′-119°26′E and 29°24′-31°1′N) has a total area of 1522 km 2 ( Figure 1). Dominated by a hilly terrain, more than 70% of the basin is at an altitude of above 500 m. Land use and land cover in the basin is mainly forestland and cultivated land, which respectively accounts for 78.9% and 14.6% of the total coverage.  Located in the subtropical monsoon climate zone, rainfall in the SRB is usually abundant. Between 2009 and 2017, mean annual rainfall observed by rain gauges in the basin ranges from 1747 mm in 2013 to 2700 mm in 2015 with an overall average of 2278 mm ( Figure A1). Within each year, mean monthly rainfall usually increases steadily from January to May and peaks in June. Precipitation in June alone could account for more than one-fifth of annual total rainfall. After June, monthly rainfall falls sharply and exhibits an overall decreasing trend till the end of the year ( Figure A2).

Satellite Precipitation Products
The TRMM satellite was launched in 1997 through a joint space mission between the NASA of U.S. and the National Space Development Agency of Japan [30]. TRMM carries five instruments, including a suite of three rainfall sensors (Precipitation Radar (PR), TRMM Microwave Imager (TMI), Visible and Infrared Sensor (VIRS)) and two related instruments (Lightening Imaging Sensor (LIS) and Clouds and the Earth's Radiant Energy System (CERES)). The TRMM Multi-satellite Precipitation Analysis (TMPA) products combine infrared (IR) data from geostationary satellites, such as GOES-W, GOES-E, GMS, Meteosat-5, Meteosat-7, and NOAA-12, with microwave (MW) data from multiple satellites including TMI/TRMM (TRMM Microwave Imager), SSMI/DMSP (Special Sensor Microwave Imager/Defense Meteorological Satellite Program), AMSU/NOAA (Advanced Microwave Sounding Unit/National Oceanic and Atmospheric Administration), and AMSR-E (Advanced Microwave Scanning Radiometer-EOS) [31]. The TMPA products are produced in the following four stages. First, the MW precipitation estimates are calibrated and combined using algorithms such as sensor-specific versions of the Goddard Profiling Algorithm (GPROF). Secondly, IR precipitation estimates are created using the calibrated MW precipitation. Thirdly, the MW and IR precipitation estimates are combined. Finally, rain gauge data are incorporated. Detailed descriptions of the algorithms and steps for producing the TMPA products could be found in Huffman et al. (2007) [32] and Huffman et al. (2018) [33].
In May 2012, the TMPA was upgraded from version 6 (V6) to version 7 (V7) by implementing the latest version of re-calibration algorithm and using the new GPCC monthly precipitation products for bias correction. The TMPA 3B42 consists of two products: the near-real-time product (3B42RT) and the post-processed product (3B42). The 3B42RT product, which is released approximately 9 h after real-time, spans the latitude belt from 50 • N to 50 • S. In contrast, with a more extensive coverage from 60 • N to 60 • S, the 3B42 product is released 10-15 days after each month when the bias correction has been made based on ground gauge records.
As a global successor of TRMM, the GPM project is launched in 2014 to provide global precipitation observations. The GPM satellite is equipped with an advanced Dual-frequency Precipitation Radar (DRP) that observes the internal structure of storms within and under the clouds, and a GPM Microwave Imager (GMI) that measures the type, size, and intensity of precipitation. The DPR is more sensitive than its TRMM predecessor especially in the measurement of light rainfall and snowfall in high latitude regions.
In March 2014, NASA released its first GPM-era global precipitation product-IMERG (Integrated Multi-satellites Retrievals for GPM). The IMERG algorithm is designed to inter-calibrate, interpolate, and merge all available satellite MW precipitation estimates, MW-calibrated IR satellite estimates, gauge measurements, and other potential precipitation estimates at fine spatial and temporal resolution worldwide. Its inter-calibration of available MW data is similar to TMPA, but further interpolated and re-calibrated by the Climate Prediction Center (CPC) morphing Kalman Filter technique and the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Cloud Classification System (PERSIANN-CCS) [34,35]. IMERG includes three products with different latencies: the near-real-time 'Early' (near real-time with a latency of 6 h) run (IMERG-E), the near-real-time 'Late' (reprocessed near real-time with a Remote Sens. 2020, 12, 1042 5 of 30 latency of 18 h) run (IMERG-L), and the post-real-time 'Final' (gauge-adjusted with a latency of four months) run (IMERG-F). The algorithm for the IMERG was upgraded from Version 5 (V5) to Version 6 (V6) to reduce bias and improve consistency among different IMERG runs in June 2019. For example, the 'displacement vectors' in V6 are computed using the Modern Era Retrospective Reanalysis 2 (MERRA-2) and Goddard Earth Observing System (GEOS) model Forward Processing (FP) data instead of the previously used infrared data, which helps ensure consistency in the vectors between the Final Run and the Early and Late Runs.
In this study, we aim to evaluate the performance of a total of five SPPs in the SRB, including the Early, Late, and Final runs of the IMERG V6 products (0.1 • × 0.1 • and 30-min interval), and the near-real-time and post-processed runs of the TMPA V7 products (0.25 • × 0.25 • and 3-hour interval). The SPPs datasets are all downloaded from the NASA website (https://disc.gsfc.nasa.gov/). After being downloaded, the SPPs datasets are adjusted to the local time, which is eight hours ahead of Coordinated Universal Time (UTC). Since the IMERG V6 and TMPA V7 products respectively contain 30-min and 3-h rainfall estimates, they need to be processed before being evaluated at different temporal scales. At the daily and monthly scale, both IMERG and TMPA data are directly aggregated to the corresponding levels for comparison with ground measurements. For evaluation at the hourly scale, the TMPA hourly rainfall estimates are obtained by assuming a constant rainfall intensity over the 3-hour period.

Ground Rainfall Measurements
Hourly and daily precipitation records from 2009 to 2017 at a total of 13 rainfall stations in the SRB ( Figure 1) are obtained from the hydrological yearbook series published by Ministry of Water Resources of China. The rain gauges used at the rainfall stations are tipping buckets. All of the rainfall data have gone through strict quality control following the relevant China's industry standards such as QX/T 118-2010 (quality control of surface meteorological observation data) before being published. There are no rainfall data missing at the 13 stations. In the SRB, daily rainfall has been recorded throughout the year, while hourly rainfall only documented for the relatively wet period from April to October. Correspondingly, the monthly and daily estimates of the SPPs are evaluated throughout the year, while their hourly estimates are only assessed within the seven months.

Evaluation Metrics
A total of four continuous metrics are used to evaluate the quality of satellite precipitation products in the SRB. Correlation coefficient (CC) is used to quantify the linear correlation between satellite precipitation estimates and ground measurements; it varies between −1 and 1, with a value close to 0 indicating little correlation. Root-mean-square error (RMSE) quantifies the degree of dispersion between satellite precipitation and measured precipitation, which can reflect the overall error level and accuracy of SPPs [36]; mean absolute difference (MAD) evaluates the magnitude of the average difference between satellite precipitation and measured precipitation. Smaller values of RMSE and MAD indicate a better performance of the SPPs. Relative bias (RB) measures the systematic bias of satellite precipitation compared with gauge observations. A positive and negative RB indicates overestimation and underestimation, respectively. As a rule of thumb, SPPs can be considered as reliable when RB falls between −10% and 10% and CC exceeds 0.7 [37].
The four continuous metrics are calculated as [38][39][40] Remote Sens. 2020, 12, 1042 where n is the number of the simulated and observed data pairs; X s i and X o i denote the ith simulated and observed amount, respectively; X s and X o are the mean of the simulated and observed data, respectively. Besides the continuous metrics, three categorical evaluation metrics are used to evaluate the precipitation detection capability of the SPPs, which include probability of detection (POD), false alarm rate (FAR), and critical success index (CSI). POD represents the ratio of correctly detected precipitation occurrences by the SPPs to the total number of actual precipitation occurrences. With an optimal value of 1, a higher POD indicates that the SPP is more capable of detecting the actual precipitation occurrences. FAR calculates the ratio of falsely detected precipitation occurrences to the total number of detected precipitation occurrences. With an optimal value of 0, a lower FAR indicates that the SPP is less likely to yield false precipitation occurrences. CSI incorporates both missed events and false detections in its calculation [41]. With an optimal value of 1, a higher CSI indicates a better performance of the SPP with more correct detections as well as fewer false alarms of precipitation occurrences. Based on the number of hits (H), false alarms (F), and misses (M) ( where S represents rain gauge observation; P represents satellite rainfall estimate; H (hits) represents the number of cases when both the rain gauge and the satellite determine the rainfall to equal or exceed the threshold; F (false alarms) represents the number of cases when the satellite determines the rainfall to equal or exceed the threshold but not the rain gauge; M (misses) represents the number of cases when the rain gauge determines the rainfall to equal or exceed the threshold but not the satellite; and Z (correct negatives) represents the number of cases when both the rain gauge and the satellite determine the rainfall to fall below the threshold.

Analysis of Variance (ANOVA)
To evaluate the rainfall estimation performance of the five satellite precipitation products, four continuous metrics (CC, RB, RMSE, and MAD) are respectively calculated at each of the 13 rainfall stations across the monthly, daily, and hourly scales. Previous studies have mostly used the mean values of the metrics over all rainfall stations to compare the rainfall estimation performance among the SPPs. This simple averaging approach, however, does not account for the variability in the metrics among the rainfall stations. Furthermore, it is incapable of determining the significance of the difference between the SPPs.
In view of the deficiency, we adopt the one-way analysis of variance (ANOVA) to statistically evaluate the difference in metrics between the SPPs. In the ANOVA, satellite precipitation product Remote Sens. 2020, 12, 1042 7 of 30 type is used to designate the five groups of metrics for comparison. If the ANOVA determines there is some significant difference in the mean metrics among the SPPs, multiple commonly used posterior comparison tests-including the Bonferroni, Sidak, Tukey, and Scheffe tests built in the Origin 2018 Statistical Package-are further used to identify the pairs of SPPs whose mean metrics are indeed statistically different.  Figure A3 compares the scatterplots between observed and estimated monthly rainfall among the SPPs. In general, all five SPPs are capable of capturing the overall trend of monthly precipitation variations. The annual CCs of the five SPPs are all above 0.85, while those of IMERG_F and 3B42 even exceed 0.95. The IMERG products all exhibit a tendency of underestimating rainfall, especially in wet months. In particular, for June, 2011 whose monthly rainfall reached as high as 1109 mm, IMERG_E, IMERG_L, and IMERG_F give a low estimate of 608, 608, and 710 mm, respectively. Statistical Package-are further used to identify the pairs of SPPs whose mean metrics are indeed statistically different. Figure 2 compares the mean of the observed monthly precipitation in the SRB with that of the five SPPs from January 2009 to December 2017. Figure A3 compares the scatterplots between observed and estimated monthly rainfall among the SPPs. In general, all five SPPs are capable of capturing the overall trend of monthly precipitation variations. The annual CCs of the five SPPs are all above 0.85, while those of IMERG_F and 3B42 even exceed 0.95. The IMERG products all exhibit a tendency of underestimating rainfall, especially in wet months. In particular, for June, 2011 whose monthly rainfall reached as high as 1109 mm, IMERG_E, IMERG_L, and IMERG_F give a low estimate of 608, 608, and 710 mm, respectively. Table 2 compares the mean values of the four continuous evaluation metrics over the 13 rainfall stations among the five SPPs both annually and seasonally. With the highest RMSE of 101.42 mm and MAD of 65.72 mm, the 3B42RT product deviates the most from historical rainfall observations annually. A closer examination of the seasonal changes in RMSE and MAD, however, have shown that its considerably larger deviation in summer is the main cause. In the three seasons other than summer, 3B42RT actually deviates less than the two near-real-time IMERG products (Table2). Annually, the three IMERG products tend to underestimate monthly precipitation, while the two TMPA products behave the opposite. Seasonally, IMERG_F tend to underestimate monthly its considerably larger deviation in summer is the main cause. In the three seasons other than summer, 3B42RT actually deviates less than the two near-real-time IMERG products (Table 2). Annually, the three IMERG products tend to underestimate monthly precipitation, while the two TMPA products behave the opposite. Seasonally, IMERG_F tend to underestimate monthly precipitation throughout the year, while the other two IMERG products act the same except that they tend to slightly overestimate in winter. In contrast, the RB of 3B42 remains close to zero all over the year except it approaches 10% in winter. Meanwhile, the RB of 3B42RT fluctuates much more ranging from -4.9% in spring to 25.8% in summer (Table 2). In terms of the other three continuous metrics, the five SPPs have exhibited somewhat similar seasonal patterns of change. For example, the CCs of the SPPs all reach or approach their peak values in summer, while decreasing to the bottom in fall. Meanwhile, both the RMSEs and MADs of the SPPs all rise to the top in summer, drop to medium in spring, and down to the lowest in fall and winter ( Table 2). The seasonal changes in RMSEs and MADs correspond closely to the changes in the magnitude of seasonal rainfall.

Temporal Analysis
Except for RB, the two post-time products (IMERG_F and 3B42) perform significantly better than the rest real-time or near real-time products both annually and seasonally, with a noticeably higher value of CC (e.g., 0.97 and 0.95 annually) as well as considerably lower values of RMSE (e.g., 53.82 and 54.13 mm annually), and MAD (e.g., 35.79 and 37.07 mm annually) ( Table 2). The findings of the overall better performance of the two post-time SPPs products compared to the real-time or near real-time products at the monthly time scale are not surprising, since both are generated after the adjustment of real-time products based on monthly measurements of ground rain gauges [42], although which may not include the 13 rain gauges covered in our study. With their annual CC and RB values exceeding the good performance thresholds, both IMERG_F and 3B42 can be regarded as reliable sources of monthly precipitation in the SRB. Similar to our study, previous studies have also observed satisfactory performance of IMERG_F and 3B42 in monthly rainfall estimation [43,44]. Figure 3 compares the spatial distribution of the four annual continuous evaluation metrics among the SPPs at the monthly scale. The spatial distributions of the CCs varies considerably among the IMERG products, while staying similar between the TMPA products. Topography does not seem to be a significant influencing factor of the CCs, although some stations at higher altitude (station 12 and 13) do have lower CCs in all five SPPs. In terms of RMSEs and MADs; however, topography plays a more eminent role. For all five SPPs, both metrics tend to get larger at higher altitude. In particular, RMSEs and MADs of three stations at high altitude (station 10, 12, and 13) consistently surpass those of the rest stations. In addition, the RBs exhibit a similar spatial pattern among the IMERG products. They all tend to underestimate monthly rainfall more severely at high altitude (e.g., stations 12 and 13). However, for the TMPA products, no clear pattern in the spatial distribution of RB could be observed.

Spatial Variation
Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 34 to be a significant influencing factor of the CCs, although some stations at higher altitude (station 12 and 13) do have lower CCs in all five SPPs. In terms of RMSEs and MADs; however, topography plays a more eminent role. For all five SPPs, both metrics tend to get larger at higher altitude. In particular, RMSEs and MADs of three stations at high altitude (station 10, 12, and 13) consistently surpass those of the rest stations. In addition, the RBs exhibit a similar spatial pattern among the IMERG products. They all tend to underestimate monthly rainfall more severely at high altitude (e.g., stations 12 and 13). However, for the TMPA products, no clear pattern in the spatial distribution of RB could be observed. Similar to our study, Milewski et al. [45] also found that elevation was a key factor affecting the accuracy of the TMPA products in Northern Morocco. The CCs of all four TMPA products at the low elevation class (0-500 m) consistently surpassed those at the medium (500-1000 m) and high (> 1000 m) elevation classes. Contrary to the CCs, the normalized RMSEs at the low elevation class were consistently smaller. Figure A4 compares the scatterplots between observed and estimated daily rainfall among the SPPs. Daily rainfall estimates by the IMERG_F cluster the closest around the 1:1 line, while 3B42RT estimates deviating the largest with a strong tendency of overestimation. For the daily scale assessment of SPPs, four continuous (CC, RMSE, RB, and MAD) and three categorical (POD, CSI, FAR) evaluation metrics are calculated using daily rainfall records at the 13 rainfall stations and the corresponding SPP estimates from 1 January 2009 to 31 December 2017.

Continuous Evaluation Metrics
(1) Temporal Variation Table 3 compares the mean values of the continuous evaluation metrics among the SPPs at the daily scale. Annually, all four continuous evaluation metrics except RB have indicated a relatively better performance by the IMERG products in estimating daily rainfall. For example, the annual Similar to our study, Milewski et al. [45] also found that elevation was a key factor affecting the accuracy of the TMPA products in Northern Morocco. The CCs of all four TMPA products at the low elevation class (0-500 m) consistently surpassed those at the medium (500-1000 m) and high (>1000 m) elevation classes. Contrary to the CCs, the normalized RMSEs at the low elevation class were consistently smaller. Figure A4 compares the scatterplots between observed and estimated daily rainfall among the SPPs. Daily rainfall estimates by the IMERG_F cluster the closest around the 1:1 line, while 3B42RT estimates deviating the largest with a strong tendency of overestimation. For the daily scale assessment of SPPs, four continuous (CC, RMSE, RB, and MAD) and three categorical (POD, CSI, FAR) evaluation metrics are calculated using daily rainfall records at the 13 rainfall stations and the corresponding SPP estimates from 1 January 2009 to 31 December 2017.

Continuous Evaluation Metrics
(1) Temporal Variation Table 3 compares the mean values of the continuous evaluation metrics among the SPPs at the daily scale. Annually, all four continuous evaluation metrics except RB have indicated a relatively better performance by the IMERG products in estimating daily rainfall. For example, the annual RMSEs of the IMERG products range from 9.66 mm/d (IMERG_F) to 11.30 mm/d (IMERG_E), while those of the TMPA products both exceed 11.50 mm/d. Seasonally, however, while IMERG_F generally remains the best product for estimating daily rainfall, the other two near-real-time IMERG products tend to perform better than the TMPA products in spring and summer, but often worse in fall and winter (except for RB). Within each SPP family, the order of the daily rainfall estimation accuracy is largely consistent, i.e., IMERG_F > IMERG_L > IMERG_E (except for RB) and 3B42 > 3B42RT. Moreover, except for RB, the five SPPs have exhibited rather similar seasonal patterns of change in their daily metrics. For example, the CCs of the five SPPs all peak in summer and fall to the bottom in fall. The RMSEs and MADs of the SPPs all tend to peak in summer, decline in spring, and down to the lowest in fall and winter. Throughout the year, the Shuaishui River Basin is affected by different climatic systems. Mainly under the influence of the high-altitude trough, precipitation in fall and winter is mostly brought by stratiform clouds, which tends to be stable and therefore easier to measure. In spring and summer, however, the convective component in the precipitation system increases due to the Meiyu front and shear line system. Both thermal convection precipitation under the control of the Western Pacific Subtropical High (WPSH) and rainstorms caused by the typhoon system increase the difficulty of obtaining accurate measurement of rainfall because of their characteristics of short duration and high spatial heterogeneity. The differences in climatic systems have led to different seasonal rainfall characteristics, with more rainy days and higher rainfall intensity in spring and summer than in fall and winter. Between 2009 and 2017, there have been 529 and 589 rainy days in spring and summer, compared to 425 and 452 days in fall and winter. Days with precipitation < 1 mm account for 44.2% and 41.8% in fall and winter, compared to 30.6% and 26.7% in spring and summer. Meanwhile, days with precipitation > 50 mm account for 4.9% and 8.3% in spring and summer, compared to around 1.5% in fall and winter. The CCs' peaking in summer could be attributed to the season's large variability in daily rainfall, whose general pattern of change is relatively well captured by the SPPs. However, because there are more days with heavy precipitation in summer, the absolute errors of SPP estimates remain the largest in the season. Similarly, the lowest RMSEs and MADs in fall and winter are probably owing to their dominance of days with lower precipitation.
Similar to our findings, Su et al. [46] concluded that the post-time IMERG-F product, with a CC of 0.79, RMSE of 6.31 mm/d, and RB of 9.04%, was the best IMERG product for estimating daily rainfall in the Upper Huai River Basin of China. Meanwhile, Anjum et al. [43] found the post-time 3B42 V7 product, with a CC of 0.70 and RB of 14.77%, performed better than the real time 3B42RT product for estimating daily rainfall in Pakistan.
Compared to those at the monthly scale, the CCs of all five SPPs have decreased considerably at the daily scale. For instance, the annual CC of IMERG_F drops from 0.97 at the monthly scale to 0.81, while the annual CCs of the other SPPs all drop further to around 0.75. In contrast to the CCs, the RBs of the SPPs at the daily scale are more similar to those at the monthly scale in terms of both their signs and magnitude. Annually, all three IMERG products tend to underestimate daily precipitation with the lowest RB of −9.99%, while both TMPA products tend to overestimate with the lowest RB of 1.21%. Seasonally, the IMERG family products tend to underestimate daily rainfall in all four seasons except the two near-real-time products in winter. In contrast, 3B42RT exhibits a strong tendency of overestimation in summer and winter, while 3B42 only in winter.
(2) Statistical Performance Comparison among the SPPs One-way ANOVA could be used to assess whether the mean values of the four continuous metrics are significantly different among the five SPPs. One critical pre-condition of performing ANOVA is to ensure the homogeneity of variance among the compared groups. In this study, we use the Levene's Test to compare the variance of the metrics among the five SPPs, which confirm that all four metrics could meet the requirement of homogeneity of variance. The subsequent one-way ANOVA has concluded that the mean values of all four metrics are significantly different among the SPPs at the significance level (α) of 0.05 (Figure 4).
In view of the significant ANOVA results, multiple posterior comparison tests-including the Bonferroni, Sidak, Tukey, and Scheffe tests-are further conducted to identify the pairs of SPPs whose mean metrics are truly significantly different. In Figure 4, two SPPs are connected with a black dotted line if posterior comparison tests have concluded a non-significant difference between their means at the α level of 0.05. As shown in the figure, the mean values of CC are significantly different between all pairs of SPPs except between IMERG_L and the two TMPA products as well as between the TMPA products themselves; the mean values of RMSE are all significantly different except between 3B42 and the two near real-time IMERG products as well as between the two near real-time IMERG products themselves; the mean values of RB are all significantly different except between the three pairs of IMERG products; the mean values of MAD are all significantly different except between IMERG_L and the other two IMERG products as well as between IMERG_E and 3B42. It is worth noting that the posterior comparison tests have shown that IMERG_F is the single IMERG product that is significantly different from the TMPA products in terms of all four metrics.
One-way ANOVA could be used to assess whether the mean values of the four continuous metrics are significantly different among the five SPPs. One critical pre-condition of performing ANOVA is to ensure the homogeneity of variance among the compared groups. In this study, we use the Levene's Test to compare the variance of the metrics among the five SPPs, which confirm that all four metrics could meet the requirement of homogeneity of variance. The subsequent one-way ANOVA has concluded that the mean values of all four metrics are significantly different among the SPPs at the significance level (α) of 0.05 (Figure 4). (3) Spatial Variation Figure 5 compares the spatial distribution of the four annual continuous evaluation metrics at the daily scale among the SPPs. The spatial distribution of the CCs varies considerably among the IMERG products, while staying similar between the TMPA products. Topography does not seem to be a significant influencing factor of the CCs, although all five SPPs have lower CCs at some stations of higher altitude (station 12 and 13). In terms of RMSEs and MADs, however, topography plays a more eminent role. For all five SPPs, both metrics tend to get larger at higher altitude. In particular, RMSEs and MADs of three stations at high altitude (stations 10, 12, and 13) consistently surpass those of the rest stations. In addition, topography seems to affect the RBs of the IMERG products considerably, which tend to underestimate daily rainfall more seriously at high altitude. The impact of topography on the RBs of the TMPA products, however, is rather mixed. The absolute RB of the 3B42RT product is actually smaller at higher altitude. Similar to our study, Wang et al. (2019) [47] also observes more serious underestimation of daily rainfall by the IMERG products at high altitude in the Hexi region deep in the hinterland of the Eurasian continent. The underestimation of rainfall at high altitude by the SPPs could owe to local precipitation augmentation induced by topographical lift.

Categorical Evaluation Metrics
(1) Temporal Variation Besides the continuous metrics, three categorical metrics are used to assess the daily precipitation detection capabilities of the SPPs. Table 4 compares the mean values of the categorical evaluation metrics among the SPPs at the daily scale. A daily rainfall threshold of 1 mm/d is used in calculating the metrics.
In terms of POD, the IMERG products all tend to perform better than the TMPA products both annually and seasonally except in summer, during which IMERG_L gives the poorest performance. Seasonally, the PODs of the two family products exhibit a somewhat different pattern of change. The PODs of the IMERG family products tend to peak (≥ 0.85) in spring, decline slightly to around 0.8 in summer, and further to around 0.67 in fall and winter. The PODs of the TMPA products peak in summer, drop to around 0.75 in spring, and further down to 0.6 in fall and < 0.45 in winter. IMERG_F has the highest POD throughout the year, except it is slightly less than the other two IMERG products in spring ( Table 4). The higher PODs in spring and summer indicate that the SPPs are poorer at detecting light precipitation that is more dominant in fall and winter. Meanwhile, the much lower PODs of the TMPA products in winter indicate that it is less capable of estimating solid precipitation than the IMERG products.
Unlike POD, the IMERG products all tend to perform worse than the TMPA products in terms of FAR. Meanwhile, the FARs of the five SPPs show a similar seasonal pattern of change, which peak in summer, decrease slightly in fall, and drop further in spring and winter (Table 4).
Incorporating both correct rainfall detection and false alarm, CSI indicates a mixed performance among the SPPs. Among the five SPPs, IMERG_F performs the best annually, as well as in fall and winter. It performs slightly worse than both TMPA products in summer and IMERG_L in spring. Seasonally, all five SPPs tend to perform the best in spring and then in summer. However, the IMERG products tend to perform slightly better in winter than in fall, while the TMPA products perform considerably worse in winter (Table 4).
Similar to our study, Xu et al. (2019) [48] concludes that IMERG_F performs better than 3B42 in detecting precipitation events in the relatively flat Huang-Huai-Hai Plain of East Coastal China, with an annual POD of 0.83 and CSI of 0.52. The PODs and CSIs of IMERG_F surpass those of 3B42 in all

Categorical Evaluation Metrics (1) Temporal Variation
Besides the continuous metrics, three categorical metrics are used to assess the daily precipitation detection capabilities of the SPPs. Table 4 compares the mean values of the categorical evaluation metrics among the SPPs at the daily scale. A daily rainfall threshold of 1 mm/d is used in calculating the metrics. In terms of POD, the IMERG products all tend to perform better than the TMPA products both annually and seasonally except in summer, during which IMERG_L gives the poorest performance. Seasonally, the PODs of the two family products exhibit a somewhat different pattern of change. The PODs of the IMERG family products tend to peak (≥ 0.85) in spring, decline slightly to around 0.8 in summer, and further to around 0.67 in fall and winter. The PODs of the TMPA products peak in summer, drop to around 0.75 in spring, and further down to 0.6 in fall and < 0.45 in winter. IMERG_F has the highest POD throughout the year, except it is slightly less than the other two IMERG products in spring ( Table 4). The higher PODs in spring and summer indicate that the SPPs are poorer at detecting light precipitation that is more dominant in fall and winter. Meanwhile, the much lower PODs of the TMPA products in winter indicate that it is less capable of estimating solid precipitation than the IMERG products.
Unlike POD, the IMERG products all tend to perform worse than the TMPA products in terms of FAR. Meanwhile, the FARs of the five SPPs show a similar seasonal pattern of change, which peak in summer, decrease slightly in fall, and drop further in spring and winter (Table 4).
Incorporating both correct rainfall detection and false alarm, CSI indicates a mixed performance among the SPPs. Among the five SPPs, IMERG_F performs the best annually, as well as in fall and winter. It performs slightly worse than both TMPA products in summer and IMERG_L in spring. Seasonally, all five SPPs tend to perform the best in spring and then in summer. However, the IMERG products tend to perform slightly better in winter than in fall, while the TMPA products perform considerably worse in winter (Table 4).
Similar to our study, Xu et al. (2019) [48] concludes that IMERG_F performs better than 3B42 in detecting precipitation events in the relatively flat Huang-Huai-Hai Plain of East Coastal China, with an annual POD of 0.83 and CSI of 0.52. The PODs and CSIs of IMERG_F surpass those of 3B42 in all seasons, especially in winter. This indicates that IMERG_F performs better in detecting precipitation events, especially in capturing light or solid precipitation.
(2) Spatial Variation Figure A5 compares the spatial distribution of the three annual categorical evaluation metrics among the SPPs. Unlike the case of continuous metrics, topography does not seem to impose a consistent impact on categorical metrics at the daily scale. For example, while most SPPs have lower correct precipitation detection rates (PODs) at the two stations of high altitude (Station 12 and 13), they also have lower false alarm rates (FARs) at the stations. The leads to no obvious pattern in the spatial distribution of CSIs, with varied performance of stations at similar altitudes.
(3) Variation with Rainfall Thresholds Figure 6 compares the performance of precipitation detection among the five SPPs by different daily rainfall magnitude. Each of the three categorical metrics has been sequentially calculated annually for the days when daily rainfall exceeds 1, 5, 10, 25, 50, 75, 100, and 150 mm/d. Similar daily rainfall thresholds have been used in previous studies, such as Wu et al. [29], Anjum et al. [43], and Tan et al. [49].
As seen from Figure 6a, the PODs of all five SPPs exhibit a largely decreasing trend with the increase of daily rainfall threshold until hitting the bottom at the threshold of 100 mm/d. Afterwards, the PODs of all SPPs bounce back substantially at the threshold of 150 mm/d. Interestingly, the PODs of the two TMPA products have mostly surpassed those of the IMERG products, indicating their better capabilities of correctly detecting daily rainfall occurrences.
However, as shown in Figure 6b, the FAR values of the TMPA products have also surpassed those of the IMERG products, especially IMERG_F, at the majority of daily rainfall thresholds, indicating their higher risk of falsely detecting daily rainfall occurrences. By incorporating the factors of both false alarms and missed events, CSI provides a more comprehensive evaluation of precipitation rainfall detection performance of the SPPs. As shown in Figure 6c, the IMERG_F has the highest CSI value at the daily rainfall thresholds of less than 100 mm/d, whereas it is caught up by the 3B42 at the threshold of 100 mm/d and above. As seen from Figure 6a, the PODs of all five SPPs exhibit a largely decreasing trend with the increase of daily rainfall threshold until hitting the bottom at the threshold of 100 mm/d. Afterwards, the PODs of all SPPs bounce back substantially at the threshold of 150 mm/d. Interestingly, the PODs of the two TMPA products have mostly surpassed those of the IMERG products, indicating their better capabilities of correctly detecting daily rainfall occurrences.
However, as shown in Figure 6b, the FAR values of the TMPA products have also surpassed those of the IMERG products, especially IMERG_F, at the majority of daily rainfall thresholds, indicating their higher risk of falsely detecting daily rainfall occurrences. By incorporating the factors of both false alarms and missed events, CSI provides a more comprehensive evaluation of precipitation rainfall detection performance of the SPPs. As shown in Figure 6c, the IMERG_F has the highest CSI value at the daily rainfall thresholds of less than 100 mm/d, whereas it is caught up by the 3B42 at the threshold of 100 mm/d and above. Table 5 summarizes the performance of the SPPs in estimating daily rainfall in previous studies worldwide. Previous studies have mostly assessed SPPs over approximately two years, compared to nine years in this study. It needs to be noted that the table does not serve to rigorously compare the relative performance of the SPPs in various regions, due to the differences in temporal frame, geographical regions, as well as climatic regimes of the studies.

Comparison with Previous Studies
The CCs of both IMERG products and TMPA products in this study have surpassed those in all previous studies except the one by Su et al. [46] conducted in the Upper Huai River Basin of China. Unlike the CC, the values of the other continuous as well as categorical metrics in this study all lie at the medium level among the previous studies. In addition, similar to our findings, many of previous studies have concluded a moderately better performance of the IMERG products in estimating daily rainfall than the TMPA products. However, the observed tendency of under-estimating daily rainfall  Table 5 summarizes the performance of the SPPs in estimating daily rainfall in previous studies worldwide. Previous studies have mostly assessed SPPs over approximately two years, compared to nine years in this study. It needs to be noted that the table does not serve to rigorously compare the relative performance of the SPPs in various regions, due to the differences in temporal frame, geographical regions, as well as climatic regimes of the studies.

Comparison with Previous Studies
The CCs of both IMERG products and TMPA products in this study have surpassed those in all previous studies except the one by Su et al. [46] conducted in the Upper Huai River Basin of China. Unlike the CC, the values of the other continuous as well as categorical metrics in this study all lie at the medium level among the previous studies. In addition, similar to our findings, many of previous studies have concluded a moderately better performance of the IMERG products in estimating daily rainfall than the TMPA products. However, the observed tendency of under-estimating daily rainfall by the IMERG products and over-estimating by the TMPA products in this study is not consistent with the findings of some previous studies.  Figure A6 compares the scatterplots between observed and estimated hourly rainfall among the SPPs. Hourly rainfall estimates by all five SPPs are much scattered around the 1:1 line. Since hourly rainfall is only recorded from April to October, we evaluate the performance of the SPPs at the hourly scale for these seven months. Correspondingly, seasonal evaluation metrics are only calculated for spring (April to May), summer (June to August), and fall (September to October).

Continuous Evaluation Metrics
(1) Temporal Variation Table 6 compares the mean values of the four continuous (CC, RMSE, RB, and MAD) over the seven months from April to October and seasonally (spring, summer, fall) among the SPPs at the hourly scale. With their seven-month CC values all staying close to 0.5, SPPs have performed less satisfactorily in estimating hourly rainfall in the SRB. All IMERG products tend to underestimate hourly rainfall throughout the three seasons. The absolute RBs of IMERG_F are all below 13%, while those of the near-real-time products remain above 20%. With the smallest absolute RB among the SPPs, 3B42 tends to slightly overestimate daily rainfall in spring, but underestimate in summer and fall. Different from the other SPPs, 3B42RT shows a strong tendency of overestimation in summer.
Except for RB, the five SPPs all exhibit similar season patterns of change. Seasonally, the CCs of the SPPs all peak in spring followed by a continuous decline in summer and fall. Both the RMSEs and MADs of the SPPs are the highest in summer, followed by spring and then fall. The observed seasonal patterns at the hourly scale are quite similar to those observed at the daily scale. Figure 7a to Figure 7l further examine the changes in mean continuous metrics over a diurnal cycle in three seasons. Meanwhile, Figure 7m to Figure 7o compare the observed amount of average hourly rainfall with the corresponding SPP estimates in the three seasons. The CCs of all five SPPs have shown considerable diurnal variations in the three seasons. Despite the differences in amount, the overall diurnal patterns of change in CCs are somewhat similar among the SPPs. In summer, for example, the CCs of all SPPs tend to reach a high plateau between 3:00 a.m. and 12:00 p.m., followed by a steady fall to the bottom at around 4:00 p.m. and a rebound afterwards. As shown in Figure 7n, mean hourly summer ground measurement peaks at 3:00 p.m. All SPPs, however, have exhibited a lag of one or more hours in reaching the peak value, which may have caused their CCs all drop to the lowest in the afternoon. The diurnal patterns of change in RMSE/MAD are even more similar among the SPPs in all three seasons. Diurnal variations in both metrics are the highest in summer, followed by spring and then fall, which are consistent with the three seasons' relative magnitude of diurnal changes in hourly precipitation (Figure 7m-o). In addition, the RMSEs and MADs of all five SPPs peak at 5:00 a.m., 5:00 p.m., and 10:00-11:00 p.m. in spring, and at 3:00-4:00 p.m. in summer. As seen from Figure 7m,n, hourly ground measurements also peak at these times. In addition, the relative performance of the SPPs at the hourly scale is somewhat different from that at the monthly and daily scales. In general, there is much less variability in the performance of the SPPs at the hourly scale compared to that at the monthly and daily scales. Except for RB, only two IMERG products (IMERG_F and IMERG_L) have slightly outperformed the TMPA products for most of the time. The diurnal patterns of change in RB are more complex. Although differing much in their actual amount, the RBs of the five SPPs seem to follow a somewhat similar trend of change throughout the diurnal cycle, especially in summer. This is probably because rainfall estimates by the SPPs all exhibit a largely similar hourly trend in each season, in spite of the differences in their actual amount. Nevertheless, precisely because of the difference in their actual RB amount, the five SPPs give quite different estimation performance across the diurnal cycle. For example, 3B42RT tend to overestimate hourly rainfall most seriously at night (6:00 p.m. and 9:00-10:00 p.m.), while giving the estimates with the least bias in the morning. In contrast, IMERG products tend to underestimate rainfall mostly seriously in the morning (8:00-11:00 a.m.), but give the estimates with the least bias at night (7:00-10:00 p.m.) (Figure 7).
In addition, the relative performance of the SPPs at the hourly scale is somewhat different from that at the monthly and daily scales. In general, there is much less variability in the performance of the SPPs at the hourly scale compared to that at the monthly and daily scales. Except for RB, only two IMERG products (IMERG_F and IMERG_L) have slightly outperformed the TMPA products for most of the time.
To date, only limited studies have evaluated the quality of the hourly rainfall estimates of the SPPs. Similar to our study, they have mostly found that the performance of SPPs in estimating hourly rainfall was less satisfactory. For example, Caracciolo et al. [53] calculated the CCs to be respectively 0.32 and 0.26 when using the IMERG_F V4 for estimating hourly rainfall in Sardinia and Sicily of Italy. Li et al. [54] evaluated the performance of IMERG_F in estimating hourly rainfall in the Ganjiang River Basin of China, and calculated its CC, RMSE, and RB to be 0.33, 1.72 mm/h, and 0.12%, respectively. Yuan et al. [55] evaluated the 3-hour rainfall estimates by the three IMERG and two TMPA products in the Chindwin River basin, Myanmar, and they found that IMERG_F performed best with a CC of 0.33 and RB of −6.8%. Meanwhile, the RMSEs of the SPPs were similar, ranging from 2.9 to 3.1 mm/h. Levene's Test has confirmed that all four metrics could meet the pre-condition of homogeneity of variance for conducting one-way ANOVA. The subsequent one-way ANOVA has concluded that the mean values of all four metrics are significantly different among the SPPs at the significance level (α) of 0.05. Further posterior comparison tests have shown that the CCs are significantly different between most of the pairs of SPPs except four pairs (IMERG_L/IMERG_F; IMERG_E/3B42; IMERG_E/3B42RT; and 3B42/3B42RT), while the RBs are significantly different except between two pairs (IMERG_E/IMERG_L and IMERG_F/3B42). Unlike CC and RB, the RMSEs of the SPPs are only significantly different between one pair (IMERG_L/3B42RT). Finally, the MADs are only significantly different between 3B42RT and all IMERG products, as well as between 3B42 and IMERG_L ( Figure 8). It is worth noting that the posterior comparison tests have shown that IMERG_F is not significantly different from the TMPA products in terms of all four metrics except CC.
between most of the pairs of SPPs except four pairs (IMERG_L/IMERG_F; IMERG_E/3B42; IMERG_E/3B42RT; and 3B42/3B42RT), while the RBs are significantly different except between two pairs (IMERG_E/IMERG_L and IMERG_F/3B42). Unlike CC and RB, the RMSEs of the SPPs are only significantly different between one pair (IMERG_L/3B42RT). Finally, the MADs are only significantly different between 3B42RT and all IMERG products, as well as between 3B42 and IMERG_L ( Figure  8). It is worth noting that the posterior comparison tests have shown that IMERG_F is not significantly different from the TMPA products in terms of all four metrics except CC. (3) Spatial Variation Figure 9 compares the spatial distribution of the four annual continuous evaluation metrics among the SPPs. At the hourly scale, topography also does not seem to be a significant influencing factor of the CCs, with lower CC values observed at stations of both low and high altitude. However, the spatial distribution of the other three metrics does indicate a significant impact of topography on the performance of the SPPs in estimating hourly rainfall. Both RMSEs and MADs exhibit similar spatial patterns across the five SPPs, whose values at the three stations of high altitude (station 10, 12, and 13) consistently stay at the top. As discussed above, the IMERG products tend to underestimate hourly rainfall. As shown in Figure 9, underestimation by the IMERG products is especially severe at higher altitude. Meanwhile, the 3B42 product also tends to underestimate hourly rainfall more at high altitude. Different from the other SPPs, the 3B42RT product tends to overestimate hourly rainfall more seriously at lower altitude. spatial patterns across the five SPPs, whose values at the three stations of high altitude (station 10, 12, and 13) consistently stay at the top. As discussed above, the IMERG products tend to underestimate hourly rainfall. As shown in Figure 9, underestimation by the IMERG products is especially severe at higher altitude. Meanwhile, the 3B42 product also tends to underestimate hourly rainfall more at high altitude. Different from the other SPPs, the 3B42RT product tends to overestimate hourly rainfall more seriously at lower altitude.

Categorical Evaluation Metrics
(1) Temporal Variation Table 7 compares the mean values of the three categorical metrics (POD, FAR, and CSI) over the seven months from April to October and seasonally (spring, summer, fall) among the SPPs at the hourly scale. Similar to the daily scale, three categorical metrics are used to assess the hourly precipitation detection capabilities of the SPPs. An hourly rainfall threshold of 0.1 mm/d is used in calculating the metrics. As seen from the table, with lower PODs and CSIs, as well as higher FARs, all five SPPs are poorer at detecting hourly rainfall than daily rainfall. Seasonally, the five SPPs have exhibited somewhat similar patterns of change in hourly rainfall detection performance. In summer, they all have the highest correct rainfall detection rates (PODs), but also the highest false alarm rates (FARs). Between the rest two seasons, all five SPPs have higher PODs as well as lower FARs, therefore better rainfall detection performance, in spring. In fact, the seasonal CSIs indicate that the overall rainfall detection performance of the SPPs all tops in spring, followed by summer, and then winter. Figure 10 further examines the changes in mean categorical metrics over a diurnal cycle in three seasons. Among the three categorical metrics, PODs, especially those of IMERG_L and IMERG_F, exhibit relatively less hourly variations through the diurnal cycle. The only discernible pattern in the metric is that the IMERG products tend to have the highest correct rainfall detection rates (> 0.7) in early evening, while the TMPA products have the lowest at midnight in spring. Unlike POD, FAR exhibits more diurnal variations. In spring, FARs of all five SPPs tend to peak around noon. Whereas, in summer and fall, they all tend to bottom in the morning and climb to the peak at around midnight. Similar to FAR, CSI exhibits distinct diurnal variations. In spring, all SPPs have the lowest CSI, i.e., the poorest hourly rainfall detection performance at noon and midnight. In summer and fall, the performance of the SPPs tends to peak in the morning, and bottom out around midnight. (2) Spatial Variation Figure A7 compares the spatial distribution of the three annual categorical evaluation metrics (2) Spatial Variation Figure A7 compares the spatial distribution of the three annual categorical evaluation metrics among the SPPs. At the hourly scale, the spatial distribution of the categorical metrics exhibits a similar pattern across the five SPPs. All five SPPs are poorer at detecting the actual precipitation occurrences at high altitude. However, they also tend to yield more false alarms at low altitude. The conflicting impacts of elevation on the PODs and FARs of the SPPs have led to lower CSI values, i.e., worse overall hourly precipitation detection performance, at the lower altitude, especially at stations 1-3.
(3) Variation with Rainfall Thresholds Figure 11 compares the performance of hourly precipitation detection among the five SPPs by different rainfall magnitude. Each of the three categorical metrics has been sequentially calculated when hourly rainfall exceeds 0.1, 1, 5, 10, and 15 mm. As seem from Figure 11a, the PODs of the five SPPs have all decreased steadily with increasing hourly rainfall thresholds before plunging to nearly 0 at the threshold of 15 mm/hour. In general, the five SPPs do not differ much in their PODs across the entire range of hourly rainfall thresholds. The PODs of the two near-real-time IMERG products are consistently less than those of the TMPA products. The PODs of 3B42RT have actually remained at or nearly the top across the rainfall thresholds.
Remote Sens. 2020, 12, x FOR PEER REVIEW 25 of 34 (3) Variation with Rainfall Thresholds Figure 11 compares the performance of hourly precipitation detection among the five SPPs by different rainfall magnitude. Each of the three categorical metrics has been sequentially calculated when hourly rainfall exceeds 0.1, 1, 5, 10, and 15 mm. As seem from Figure 11a, the PODs of the five SPPs have all decreased steadily with increasing hourly rainfall thresholds before plunging to nearly 0 at the threshold of 15 mm/hour. In general, the five SPPs do not differ much in their PODs across the entire range of hourly rainfall thresholds. The PODs of the two near-real-time IMERG products are consistently less than those of the TMPA products. The PODs of 3B42RT have actually remained at or nearly the top across the rainfall thresholds. As seem from Figure 11b, the FARs of all five SPPs tend to rise with increasing rainfall thresholds. The TMPA products have consistently yielded higher FARs than the IMERG products over the entire range of rainfall thresholds. The more comprehensive CSI values have concluded much similar hourly rainfall detection performance among the SPPs across the rainfall thresholds, with IMERG_F staying at or near the top most of the time (Figure 11c).

Conclusions
SPPs have increasingly become an important data source for precipitation inputs in hydrological modeling and other related studies worldwide. For local regions with scarce precipitation As seem from Figure 11b, the FARs of all five SPPs tend to rise with increasing rainfall thresholds. The TMPA products have consistently yielded higher FARs than the IMERG products over the entire range of rainfall thresholds. The more comprehensive CSI values have concluded much similar hourly rainfall detection performance among the SPPs across the rainfall thresholds, with IMERG_F staying at or near the top most of the time (Figure 11c).

Conclusions
SPPs have increasingly become an important data source for precipitation inputs in hydrological modeling and other related studies worldwide. For local regions with scarce precipitation observations or limited access to precipitation data, the latest GPM and TRMM products provide a valuable alternative for obtaining the much-needed rainfall inputs for various regional hydrological applications. However, the accuracy of their rainfall estimates should be systematically assessed before being utilized in real world applications. In this study, we have assessed and compared the accuracy of the latest five GPM IMERG V6 and TRMM 3B42 V7 precipitation products across the monthly, daily, and hourly scales in a middle-sized hilly river basin in eastern central China. For evaluation, a total of four continuous and three categorical metrics have been calculated based on SPP estimates and historical rainfall records at 13 stations over a period of 9 years from 2009 to 2017. The evaluation results have led to the following main conclusions: (1) Rainfall estimates by all five SPPs could match ground observations best at the monthly scale, followed by the daily and hourly scale. The annual CCs of the SPPs, for example, have fallen from 0.86 or above at the monthly scale to mostly around 0.75 at the daily scale, and sharply to less than 0.6 (April to October) at the hourly scale. Topography tends to impose similar impact on the performance of SPPs across various time scales, with more estimation deviations at high altitude. (2) For estimating monthly rainfall, IMERG_F performs the best, closely followed by 3B42. These two post-time SPPs produce considerably better monthly rainfall estimates than the rest real-time or near-real-time SPPs. All three IMERG products tend to underestimate monthly rainfall except a slight overestimation by the two near-real-time products in winter. Meanwhile, 3B42RT exhibits a strong tendency to overestimate in summer and winter. (3) For estimating daily rainfall, the IMERG products generally perform better than the TMPA products, with IMERG_F performing the best. Similar to the monthly scale, the IMERG family products tend to underestimate daily rainfall in all four seasons except the two near-real-time products in winter. In contrast, 3B42RT exhibits a strong tendency of overestimation in summer and winter. In terms of rainfall detection performance, the TMPA products are more capable of correctly detecting daily rainfall occurrences, while the IMERG products contain fewer false detections of rainfall occurrences. (4) For estimating hourly rainfall, the performance of the SPPs is much more homogeneous. Two IMERG products (IMERG_F and IMERG_L) have slightly outperformed the TMPA products for most of the time. All IMERG products tend to underestimate hourly rainfall throughout the three seasons between April and October. In contrast, 3B42RT shows a strong tendency of overestimation in summer. In addition, the performances of hourly rainfall detection are quite similar among the five SPPs.
In general, our nine-year systematic evaluation of the latest GPM IMERG V6 and TRMM 3B42 V7 precipitation products have shown that the SPPs, especially the post-time IMERG_F product, could be considered as a reliable data source for providing monthly or daily rainfall data for regional hydrological applications. However, great caution needs to be exerted to utilize the hourly rainfall SPPs considering their overall weak correlations with ground rainfall observations, as well as the consistent tendency of underestimation by the IMERG products.
Hourly rainfall datasets have been increasingly found to be valuable inputs to a variety of hydrological applications. However, limited access to hourly rainfall datasets have restrained such applications in many regions. Owing to their wide spatial coverage and open access, SPPs have great potential to act as a useful alternative source for providing hourly rainfall data. Therefore, effective bias-correction algorithms incorporating ground rainfall observations are needed to improve the quality of hourly rainfall SPPs to safeguard the validity of their usage as ground measurement surrogates.