A Comprehensive Evaluation of Latest GPM IMERG V06 Early, Late and Final Precipitation Products across China

: This study evaluated the performance of the early, late and ﬁnal runs of IMERG version 06 precipitation products at various spatial and temporal scales in China from 2008 to 2017, against observations from 696 rain gauges. The results suggest that the three IMERG products can well reproduce the spatial patterns of precipitation, but exhibit a gradual decrease in the accuracy from the southeast to the northwest of China. Overall, the three runs show better performances in the eastern humid basins than the western arid basins. Compared to the early and late runs, the ﬁnal run shows an improvement in the performance of precipitation estimation in terms of correlation coefﬁcient, Kling–Gupta Efﬁciency and root mean square error at both daily and monthly scales. The three runs show similar daily precipitation detection capability over China. The biases of the three runs show a signiﬁcantly positive ( p < 0.01) correlation with elevation, with higher accuracy observed with an increase in elevation. However, the categorical metrics exhibit low levels of dependency on elevation, except for the probability of detection. Over China and major river basins, the three products underestimate the frequency of no/tiny rain events ( P < 0.1 mm/day) but overestimate the frequency of light rain events (0.1 ≤ P < 10 mm/day). The three products converge with ground-based observation with regard to the frequency of rainstorm ( P ≥ 50 mm/day) in the southern part of China. The revealed uncertainties associated with the IMERG products suggests that sustaining efforts are needed to improve their retrieval algorithms in the future.


Introduction
Precipitation plays a critical role in water cycle and energy balance [1][2][3][4]. Understanding the spatial and temporal variability of precipitation is essential for many applications including hydrological modeling, climatic prediction and water resource management as well as environmental and ecological risk analysis [2,[5][6][7]. In general, precipitation estimates can be obtained from three sources: ground-based observations, model simulations and remote sensing observations [8,9]. Ground-based observation is the most accurate method of retrieving precipitation records. However, it is largely limited by the sparse ground networks of rain gauges and the discontinuity of the recording sequences [10,11]. Ground-based radar is an alternative approach for measuring precipitation, but it is affected by surface backscattering, signal attenuation and reflectivity-rain-rate (Z-R) [12]. Process models, for instance, the European Centre for Medium-Range Weather Forecasts (ECMWF) and Modern-Era Retrospective Analysis for Research and Application (MERRA) [13,14], can well simulate the spatial patterns of precipitation but often show substantial uncertainties [15]. Satellite-based observations provide a unique opportunity to estimate (near) real-time precipitation globally with promising accuracy, especially for remote regions such as mountains, deserts and oceans, where ground-based observations are too sparse [16][17][18]. Consequently, estimating precipitation from remote sensing observations has become a major approach to measuring precipitation over the world [19].
In recent decades, the satellite information technologies have achieved greatly developments especially for precipitation retrieval algorithms [20]. Among them, the main precipitation retrieval algorithms include Visible (VIS), Infrared (IR), Passive Microwave (PMW), Active Microwave (AMW) and Multi-Sensor (MS) [21]. Generally, VIS and IR have high time-space resolution, while they lack the physical basis and have low accuracy [22,23]. PMW has high accuracy in global scale in comparison with VIS and IR, while its drawback is low time-space resolution [24]. Therefore, Multi-sensor Precipitation Estimation (MPE) has become the main way to retrieve high accuracy and resolution precipitation products by combining their complementary strengths [24,25]. For example, Climate Prediction Center Morphing Method (CMORPH) uses geostationary IR data to obtain cloud motion and interpolates the precipitation rate by PMW data [25]. Similarly, the TMPA (Tropical Rainfall Measuring Mission Multi-satellite Precipitation Analysis) algorithm generates the rainfall by combing the PMW data and IR temperature brightness [26].
Over the world, many available satellite-based precipitation products (SPPs) differ in terms of their development purposes, input data sources, retrieval algorithms, spatiotemporal resolutions, coverages and temporal spans. Among them, the Tropical Rainfall Measuring Mission (TRMM) has provided a valuable precipitation dataset over the tropics and subtropics since 27 November 1997 [26,27]. Subsequently, the Global Precipitation Measurement (GPM), as a successor to the TRMM, was launched on 28 February 2014, aims to produce an accurate and reliable global precipitation estimation with all available sensors in TRMM and GPM eras [28,29]. Compared to TRMM products, the GPM precipitation products have a full coverage of the globe with a half-hourly temporal resolution and 0.1 • × 0.1 • spatial resolution, whereas the TRMM products only cover the latitude range of 50 • N-50 • S at much coarser spatial (0.25 • × 0.25 • ) and temporal (three-hourly) resolutions [26]. In terms of precipitation retrieval algorithms, previous SPPs still have some limitations. For example, the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Network (PERSIANN) estimates precipitation values based on infrared brightness temperature image (as input) and artificial neural network (as a model) [30], whereas CMORPH only uses infrared data for transporting the microwavebased rain characteristics during the periods when microwave data are not available at a location [25,31]. However, the Integrated Multi-satellite Retrievals for GPM (IMERG) combine the advantages of CMORPH, PERSAINN and TMPA products [32,33]. Given these improvements, the IMERG products tend to perform better than other SPPs in many regions across the world including China [34], Malaysia [35] and East Asia [36].
Although previous studies have demonstrated that the IMERG products exhibit better accuracy and precipitation detection performance than other satellite dataset such as TRMM 3B42, 3B42RT and PERSIANN-CDR products in many regions [19,[37][38][39], the temporal coverage of these studies are limited. Recently, the latest version of IMERG products (IMERG V06) has been released, covering the period beginning from June 2000. The products include significant improvements in the algorithm used to estimate precipitations and provide estimates for precipitation phase [29] using the look-up table method developed by Sim and Liu [40]. Recent studies have highlighted the high performance of the IMERG V06 products in various contexts, which include Iran, Austria and Germany [41][42][43]. To date, however, a comprehensive accuracy evaluation of the long-term retrospective IMERG precipitation estimates for China is lacking, which has largely limited its applications in various fields.
In this study, we aimed to evaluate the long-term (10 years) retrospective IMERG precipitation data across mainland China, including the near-real-time (NRT) "Early" (IMERG_E), "Late" (IMERG_L), Post-Real-Time (PRT) and "Final" (IMERG_F) products. To this end, observed precipitation records from 696 rain gauges and the three runs of IMERG V06 products were obtained for the period between 1 January 2008 and 31 December 2017. The objectives of this study were three-fold: (a) investigate the variations in IMERG V06 performance in multiple time-space scale; (b) evaluate how the performance of IMERG V06 depends on topographic variations; and (c) assess the accuracy and detection performance of IMERG V06 for capturing different precipitation types. The long-term evaluation results could further provide references for the improvement of IMERG product algorithms. More importantly, the outcome of this wok could validate the utility of latest IMERG V06 as a source of precipitation dataset to forecast and early warning against potentially natural hazards such as extreme precipitation and drought in less prepared regions.

Study Area
Our study area includes China, which is located within 73-135 • E and 18-53 • N [16]. Globally, the elevation patterns in China decrease from west to east (Figure 1a) [44]. Precipitation in China tends to decrease from the southeast coast to the northwest inland, with higher levels of precipitation usually occurring in summer [34]. Mainland China can be divided into nine major river basins (Figure 1b): Continental Basin (CB), Southwest Basin (SWB), Songliao River Basin (SRB), Southeast Basin (SEB), Pearl River Basin (PRB), Yellow River Basin (YERB), Yangtze River Basin (YARB), Haihe River Basin (HARB) and Huaihe River Basin (HURB) [44]. SRB and HARB have a colder climate, while HURB, PRB, SEB and the lower altitude areas of YARB have a temperate climate. YERB and the low altitude areas of CB have an arid climate. SWB and high-altitude regions of YARB and CB have polar climate. SEB, PRB and the downstream of YARB have annual mean rainfall over 1000 mm (Figure 1b). SWB exhibits complex terrains with annual mean rainfall from 150 to 1000 mm, with a decrease in the amount of precipitation from east to west (Figure 1b). The annual mean rainfall of HURB is about 600-1300 mm (Figure 1b). For the northern basins, namely SRB, HARB, YERB and CB, annual mean rainfall is below 800 mm ( Figure 1b).

Rain Gauge Data
Daily precipitation data were obtained for the period of 2008-2017 from meteorological gauge stations maintained by the Chinese Meteorological Administration (http://data. cma.cn/ accessed on 5 August 2020). All data records have undergone a series of quality procedures, developed by Shen and Xiong [45], to ensure the high quality of the ground rain gauge data. Rain gauges with missing value are simply discarded, resulting in a final selection of 696 rain gauges over China (Figure 1b). This ground observation dataset was used as a benchmark for evaluating the three runs of IMERG V06 products (Early, Late and Final).

Satellite-Based Precipitation Dataset
GPM is a collaboration mission between the National Aeronautics Space Administration (NASA) and the Japan Aerospace Exploration Agency (JAXA). It was released in 2014 and aimed to provide globally a precipitation dataset at high spatiotemporal resolution. GPM was designed to extend the TRMM mission to produce the next generation of Earth's precipitation estimates, which consists of approximately 10 constellation satellites and a core observatory [27,28]. The two main sensors of GPM satellites are the Dual Frequency Precipitation Radar (DPR) and GPM Microwave Imager (GMI). GMI is used to estimate precipitation type and intensity, while DPR is utilized to explore the internal structure of storms under or within clouds [27,28].
The GPM Level 3 precipitation products were generated by NASA using the IMERG algorithm. Three types of IMERG products are available in each version, namely the early (E), late (L) and final (F) runs with a latency of 4 h, 12-24 h and 3.5 months, respectively [28,29]. In general, the early and late runs of IMERG are utilized for real-time applications such as flood monitoring and irrigation regulation, whereas the final run product is mainly for scientific research. Currently, all runs of IMERG products are available with half-hourly temporal resolution and a global coverage at 0.1 • spatial resolution. NASA also provides daily and monthly data products at 0.1 • spatial resolution. Three major changes made in the latest version (V06) of IMERG products are as follows: (1) the data of Goddard Earth Observing System model (GEOS) Forward Processing (FP) and the Modern-Era Retrospective Reanalysis 2 (MERRA-2) are used for time interpolation instead of the infrared data in IMERG V05; (2) the Sounder for Atmospheric Profiling of Humidity (SAPHIR) estimates and TMI estimates are used for V06; and (3) passive microwave estimates are morphed at high latitudes to reduce spatial gaps [29]. In this study, we used the early (IMERG_E), late (IMERG_L) and final runs (IMERG_F) of the IMERG V06 products for the period of 1 January 2008-31 December 2017. The IMERG products are available at https://disc.gsfc.nasa.gov/datasets?keywords=IMERG&page=1 (accessed on 3 August 2020).

Data Processing
To assess the performance of latest IMERG V06 product, the grid data (IMERG data) were compared with the rain gauge data based on the point-to-point analysis due to the uncertainty associated with gauge data interpolations [2,[46][47][48][49]. In this framework, the corresponding grid data of SPPs is extracted at the locations of the gauge stations.

Evaluation Metrics
Continuous metrics were used to evaluate the accuracy of the IMERG products. Pearson correlation coefficient (CC) was used to measure the correlation between rain gauge and satellite dataset, while root mean square error (RMSE) was computed to evaluate the error characteristics of satellite datasets. We also used the Kling-Gupta efficiency (KGE) statistic [50,51] to comprehensively explore the accuracy of IMERG products, considering the distance between the mean and variance of gauge-based and satellite-based time series of precipitation as well as their correlation. KGE balances the contributions of correlation, bias and variability term. The corresponding equations are as follows: where n represents the number of samples, S n represents satellite precipitation estimate and G n represents gauge observed precipitation. S and G are the mean of satelliteand gauge-based precipitation. The variation of precipitation is given by σ G and σ s , which represent the standard deviation of gauge precipitation and satellite precipitation, respectively. The CC (Equation (1)) possible values range between −1 and 1. The RMSE is computed as the average of the square of the differences between S n and G n (Equation (2)) and is non-negative by construction, with smaller values indicating a better performance. The estimation of KGE (Equation (3)) includes three terms: β is the bias ratio (β = µ S µ G ), which represents the ratio between the mean satellite precipitation (µ S ) and the mean gauge (µ G ). The variability ratio (γ = CV S CV G ) is the ratio between the coefficient of variation associated with satellite precipitations CV S and gauge data CV G , with CV S = σ S S and, similarly, CV G = σ G G . KGE values range in the interval (−∞, 1] and larger values indicate better performance. Categorical metrics were utilized to assess the precipitation detection capability of IMERG products. The critical success index (CSI) describes the ability of IMERG products to detect precipitation event, with values between 0 and 1 (the perfect value). It is expressed as a function of probability of detection (POD) ranging from 0 to 1 (the perfect score) and false alarm ratio (FAR) ranging from 0 (the perfect value) to 1, which are calculated as: where H is the precipitation event detected by both gauge and satellite simultaneously, M is the precipitation event detected by the gauge but not detected by the satellite and F is inverse with M. The calculation of CSI requires a threshold to determine rain/no-rain events. The rain gauges and SPPs have a daily time resolution in this study. We selected 0.1 mm/day as the threshold for defining the precipitation occurrence according to the definition adopted by the Chinese Meteorological Administration.

Spatial Patterns of the Continuous and Categorical Evaluation
The spatial distribution of the continuous and categorical evaluation metrics for the three runs of IMERG products over China are shown in Figures 2 and 3, respectively. For the three runs, CC and RMSE metrics show distinct spatial pattern across China, which can be seen in Figure 2a-c,m-o, respectively. The CC between the observed gauge data and satellite data ranges from 0.4 to 0.7 in southeast coastal areas, with RMSE values over 13 mm/day. According to the RMSE, IMERG_F performs better than IMERG_E and IMERG_L in southern China (PRB). Moreover, CC and RMSE decrease gradually towards the northwest inland of China. Southeast regions of China belong to sub-tropical monsoon climate zone with large amount of precipitation over the year, thus showing a high correlation and high error. The spatial pattern of β, which ranges from 0.6 to 0.8, suggests that both IMERG_E and IMERG_L underestimate the values of rain gauge rainfall in the southwest of China (SWB) (Figure 2d,e). In contrast, the IMERG_F appears to perform better in this region. The differences in the estimation of the variability ratio (γ) and KGE between the IMERG products across China are not significant. The three runs of IMERG products show a better performance in the southern and southeastern parts of China (PRB and SEB), with KGE value over 0.5. In addition, IMERG_F shows the highest performance in the HARB region ( Figure 2l). Figure 3 shows the spatial distribution of the categorical evaluation metrics (POD, FAR and CSI) across China. For the three SPPs, the estimation of the precipitation is more accurate in PRB, SEB and SWB, with POD values over 0.7 (Figure 3a-c). In CB, however, all runs have POD less than 0.5, indicating that less than half of rain events can be detected by satellite-based observations. Regarding the FAR analysis, higher accuracy is observed from the late and final runs, especially in the HARB and SRB, when compared with the early run (Figure 3d-f). The CSI shows good performance in the southeast direction, compared with the northwest (Figure 3g-i).
The evaluation results of continuous and categorical metrics of each basin are listed in Table 1. The CC and KGE estimations are higher in YARB, PRB, SEB and SWB for all runs ( Table 1). The RMSE values of three SPPs are the lowest in YERB except in CB. The highest values of POD (0.72-0.75) and lowest values of FAR (0.35) are in SWB and PRB, respectively. A better precipitation detection performance of three runs is found for PRB, SEB and SWB, with CSI value ranging 0.48-0.51 (Table 1).

Daily Scale
The performances of IMERG_E, IMERG_L and IMERG_F were evaluated on a daily basis, and the evaluation metrics are summarized in Table 2. Overall, the performance of IMERG_F is better than the IMERG_E and IMERG_L, with higher mean CC (0.47) and KGE (0.45) values and lower mean RMSE (9.26 mm/day) ( Table 2). Regarding the accuracy in the estimation of precipitation, the three runs of IMERG products show similar patterns across China, with POD value ranging from 0.67 to 0.68. The FAR values of three SPPs are, however, up to 0.5 across China, indicating poor detection performance of the IMERG products. All SPPs overestimate the mean precipitation across China, with β value ranging from 1.07 to 1.08. As shown in Figure 4, the median KGE of IMERG_E, IMERG_L and IMERG_F are 0.31, 0.30 and 0.37, respectively. The improvement is minor for CSI in comparison to KGE, indicating that gauge adjustment is more effective for improving intensity estimation than occurrence detection.

Monthly Scale
At monthly level, IMERG_F exhibits better correlations (0.94) with ground observations than IMERG_E and IMERG_L (Table 3). IMERG_E and IMERG_L show similar accuracy at monthly level. Compared to the other two runs, IMERG_F has the lowest RMSE value (34.31 mm/month). Overall, the final run shows higher accuracy in China, with KGE value up to 0.87 (Table 3).  Figure 5 shows the metric characteristics of IMERG products in different months, sorted from "good performance" (green) to "bad performance" (red). As indicated by Figure 5, CC is high from October to December but low from July to September for all runs. This is consistent with the timing of high and low frequency of precipitation in the study area. RMSE is high from May to September, which is related to the intense precipitation that affect various regions in China during these months. At a monthly level, IMERG_F consistently shows better performance than the other products in all months, with the highest KGE in autumn. For the categorical metrics of POD, FAR and CSI, July represents the month where their performance is at the highest level. The beginning of the year shows the lowest POD, while FAR reaches the highest performance level at the end of the year. As for the CSIs, the IMERG products show the best performance between June and August. The results suggest that the overall performances of IMERG_F, IMERG_E and IMERG_L are not statistically significantly different at a monthly level.

Seasonal Scale
As a complementary work, we assessed the seasonal performances of the three IMERG products at a daily level in each basin and across Mainland China ( Figure 6). For each basin, there are great discrepancies in the annual rainfall regime over different basins of China ( Figure S1). Based on the rain gauge observations during 2008-2017, the SEB has greatest annual mean rainfall of 1742.  Figure S2). With respect to accuracy performance, the precipitation from IMERG_F is moderately correlated with gauges observations in all seasons and basins, with CC values ranging from 0.14 to 0.58 (Figure 6a-d and Figure S2). Three runs show poor correlation (CC value of 0.08-0.33) with rain gauge observations for all seasons in arid regions (CB). Basically, IMERG_F has lower RMSE value than IMERG_E and IMERG_L in each basin and season (Figure 6e-h). For KGE, IMERG_F performs better in all seasons and over each basin compared to IMERG_E and IMERG_L (Figure 6i-l). For accuracy differences in nine basins, overall, three runs have better performance in humid regions (including SEB, PRB and YARB) with higher KEG values (Figure 6a-l). In regard to detection capability, the performance of the IMERG_F compared to IMERG_E and IMERG_L does not improve significantly in all seasons (Figure 6m-p). Particularly, the higher values of CSI occur in summer for all runs of IMERG, ranging from 0.32 to 0.64 ( Figure S3).

Evaluations of the Performance Dependency on Elevation
To explore the influence of elevation variations on the performance of IMERG products, we divided all rain gauges into different categories based on the elevation associated with each gauge. As shown in Table 4, the amount of precipitation is higher in low altitude regions (<500 m), with averages over 943.5 mm/year. In addition, the three SPPs exhibit the highest consistency in regions below 200 m, with CC values ranging from 0.43 to 0.47. The three runs of IMERG do not show much discrepancy in β values among these elevation categories, except for regions below 200 m and those over 2000 m. All SPPs have a positive bias in regions below 1500 m, with β value from 1.01 to 1.14 ( Table 4). The γ values of the three SPPs generally increase with elevation, indicating a substantial underestimation of precipitation variability in regions of low altitude. In all elevation categories, IMERG_F shows better performance (lower RMSE and higher KGE values ranging from 0.34 to 0.45) than IMERG_E and IMERG_L ( Table 4). The highest POD values are observed in regions with elevations below 200 m, while the lowest FAR and highest values CSI are estimated in regions above 2000 m.
The scatterplots of CC, KGE, RMSE and CSI metrics against elevation are presented in Figure 7. The three SPPs show a significantly (p < 0.01) decreasing tendency of CC with elevation variations based on liner regression (Figure 7a-c). Similarly, the KGE and RMSE values of three runs tend to decrease significantly (p < 0.01) with elevation (Figures 7d-f and 7g-i, respectively). There is no significant (p > 0.05) dependency of CSI on elevation (Figure 7j-l), although POD values significantly (p < 0.05) decreased with variation of altitude ( Figure S3g-i). Notably, the slope of trend line for detection evaluation metrics (POD, FAR and CSI) are larger for IMERG_F, indicating an improvement of the IMERG_F product in the capability of detection over high-altitude regions (Figure 7j-l and Figure S3g-l).   Figure 8 shows the probability distribution function (PDF) of daily precipitation for five precipitation intensity bins across China (Figure 8j) and in nine basins (Figure 8a-i). It is evident that the PDF of days with precipitation intensity below 0.1 mm/day for three SPPs is less than the observed one in China and each basin, whereas the opposite is found for light rain (0.1 ≤ P < 10 mm/day). The precipitation values from the IMERG estimations are closer to ground observations for the PDF of moderate rain (10 ≤ P < 25 mm/day) in all regions (China and each basin), especially for PRB. IMERG_F tends to overestimate the frequency of heavy rain (10 ≤ P < 25 mm/day) in the southern regions of China (PRB, SEB, YARB and SWB). The IMERG products have the highest levels of consistency with ground observations in the context of rainstorm (P ≥ 50 mm/day) in the PRB, YARB and SEB regions.

Evaluation for Precipitation Intensity Bins
The evaluation metrics for different precipitation intensity classes are presented in Table 5 (China) and Figure 9 (nine basins). Table 5 does not provide FAR and CSI because, if all days in the observed dataset are rainy (P ≥ 0.1 mm/day), the value of FAR is equal to zero and CSI is equal to POD. In addition, POD is calculated based on different precipitation intensity thresholds instead of a fixed threshold of 0.1 mm/day. Generally, the performance associated with CC, RMSE and KGE tend to decrease when data are split into classes. This suggests that overestimation and underestimation may partially cancel out to when the data are analyzed as a whole. Thus, larger uncertainties are expected when the assessment of SPPs dataset is conducted at the event level. The values of CC and KGE tend to increase in regions that encounter precipitation amount larger than 50 mm/day.  Figure 9 shows the accuracy performances of three SPPs for different precipitation intensity ranges over the nine basins of China. Generally, the three SPPs exhibit a poor correlation with rain gauge data for heavy rain (10 ≤ P < 25 mm/day) in all basins except for CB and YERB (Figure 9b,e,i,m,q,u,y,ac). Additionally, the highest CC between IMERG data and rain gauge data are found in the rainstorm (P ≥ 50 mm/day) over HARB, HURB, PRB, SWB and YARB (Figure 9e,i,m,y,ac, respectively). As indicated by KGE, the three runs consistently have better performance in detecting light rain (0.1 ≤ P < 10 mm/day) and rainstorm across China except for IMERG_E over SRB (Column 2 of Figure 9). Meanwhile, IMERG_F shows better performance for all basins except for SRB compared to IMERG_E and IMERG_L (Column 2 of Figure 9). The RMSE values of three SPPs are positively associated with precipitation intensity (Column 3 of Figure 9), and the final run shows little improvement in comparison to the early and late runs for these precipitation intensity bins over each basin. In particular, the IMERG_F has better performance in SWB and YERB compared to IMERG_E and IMERG_L. For the precipitation detection performance, the POD values of three runs are negatively correlated with precipitation intensity, indicating the IMERG V06 has less skills in detecting high intensity precipitation (Column 4 of Figure 9).

Reasons for the Difference in Performance of Three Runs
This work demonstrates that non-significant improvement in late runs compared with their early runs ( Table 1). The same results were found in the northeast and southeast of Austria [42,52]. In addition, compared to the early and late runs, the final run of the IMERG products shows a moderate improvement in the overall estimation of precipitation across China (Tables 2 and 3), probably due to the adjustment against the Global Precipitation Climatology Centre (GPCC) records [43,53]. Figure S4 shows the spatial distribution of rain gauges used for GPCC at its 2.5 • spatial resolutions across China. It is evident that limited gauges over China are used in the development of GPCC products. The spatial heterogeneity of GPCC data quality may lead to the diverse performances of the final run of IMERG. For example, Tan and Santo [35] reported a non-significant improvement in the IMERG final run compared to NRT products over Malaysia, which may be due to the fact that only 24 rain gauges are utilized in the development of GPCC dataset and thus cannot well characterize the spatial patterns of precipitation across Malaysia.
In this study, we found an overestimation of precipitation in the IMERG products over China (see Tables 2 and 3) at both the daily and monthly scales. Xu et al. [46], Anjum et al. [32], Sunilkumar et al. [54] and Islam [55] reported similar results about different version of IMERG products over southern Tibetan Plateau, northern Pakistan, Japan and Nepal and Bangladesh, respectively. According to the results, overestimation is larger for the early and late runs compared to the final runs. There are several possible reasons for these discrepancies: First, the GPCC datasets which are utilized to adjust the final runs of IMERG product have systematic biases in China [56], which may affect the accuracy of the IMERG products. Second, monthly GPCC datasets are used to adjust the IMERG data, and thus improvements of daily precipitation datasets are worse than those of monthly datasets [42]. Third, additional uncertainty can be attributed to the adjustment of spatial resolutions. The IMERG products and GPCC Full Data Reanalysis offer data at 0.1 • spatial resolution while the GPCC Monitoring Product offers data at 0.5 • spatial resolution [29].

Reasons for Various Performance of Three Runs in Different Elevation Regions and Precipitation Intensity Bins
Our evaluations reveal a strong dependence of the performance of IMERG products on elevation variations, particularly accentuated with the categorical metrics (POD, FAR and CSI). According to Figure 7, there is a significant relationship between the values of evaluation metrics and elevation (p < 0.01) for CC, KGE, γ, POD and RMSE. The number of rain gauges varies with elevation, which might bias the results. Topographic variations could exert complex controls on satellite-based estimation of precipitation, from IR, AMW and PMW sensors [18,48,49,57]. For example, Xu et al. [46] assessed the effects of elevation on accumulative rainfall over southern Tibetan Plateau and identified a significant relationship between elevation and the performance of GPM IMERG. Zambrano-Bigiarini et al. [18] evaluated seven SPPs in a case study in Chile and found that all, except for PGFv3, performed poorly in areas of high elevation. Beria et al. [58] and Fang et al. [59] indicated a negative relation between IMERG performances and the topographic variation over India and China, respectively. Here, some factors may influence the performance of SPPs in different elevation regions. First, precipitation generally increases with elevation, and therefore the performance of IMERG products may improve with higher precipitation intensity [41]. Second, the number of rain gauges is very limited in high mountains, and the performance of IMERG products may not increase significantly even after adjustment against gauge observations [34,60]. Third, the processes and mechanisms of precipitation formation are complex in high-altitude regions, which makes the estimation of precipitation from satellite sensors difficult [59].
As indicated by the PDF analysis, all runs tend to underestimate the frequency of no/tiny rain events (P < 0.1 mm/day) but overestimate the frequency of light rain events (0.1 ≤ P < 10 mm/day), which is consistent with research works carried out in Malaysia [35], Tibetan Plateau [46], north Pakistan [30] and Bangladesh [55]. Moreover, our evaluation results show that the RMSE values tend to increase with precipitation intensity (Table 5 and Figure 9), which is in line with the findings of Habib et al. [61] and Yu et al. [49]. Theoretically, the satellite sensors infrared and microwave sensors are designed to retrieve facet information based on the brightness and temperature of the top clouds and precipitation particles, respectively [62]. In this study, precipitation intensity is classified based on rain gauge data (point data), which may not well represent precipitation intensity in the surrounding areas of the stations (facet precipitation information), which affects the reliability of the evaluation of satellite products at different precipitation intensity classes.

Reasons for the Changing Performance of IMERG in a Long-Time Span
For a data product with a long-time span, it is important to examine whether it shows a stable performance over time. Figure S5 shows all evaluation metrics of IMERG products from 2008 to 2017. The results show no statistically significant change in the performance of eight metrics associated with the estimation of precipitation from IMERG V06 products during 2008-2017 (F-test, p > 0.05). However, all runs of IMERG products exhibit poor accuracy from 2009 to 2012 ( Figure S5d,h). From 2013, the performance of SPPs has gradually improved. It is well known that the GPM era starts from 2014, and the changing performances between the two eras (TRMM era and GPM era) indicate that IMERG is relatively robust in the transition between the two eras. It is reasonable to conclude that the increasing number of passive microwave samples has contributed to the increasing accuracy of IMERG. In addition, the improving microwave sensors with higher resolutions and more frequency channels are also likely to have contributed to the IMERG's improvement. Besides, according to the Huffman et al. [29], the IMERG team used two GPCC products, the V8 Full Data Reanalysis data and the V6 Monitoring Product, to correct IMERG products' systematic bias, whereas the former is only available for the period from 1998 to 2016 and the latter is employed to adjust data after 2016. It should be noted that the GPCC Monitoring Product is developed based on about 7000-8000 stations, while the GPCC Full Data Reanalysis includes 67,200 stations across the world. As a result, the performance of IMERG products is expected to decrease substantially after 2016 ( Figure S5). However, the accuracy of IMERG still need to be explored further after 2016.

Study Limitations and Future Works
In this study, only daily precipitation data were available for rain gauges across China, which could not allow for a more extensive and detailed assessment of satellite products at finer time scale. Therefore, the sub-daily scale evaluation can be carried in the future work. In addition, while this study evaluated the performances of IMERG V06 in nine basins, it did not consider the impact of the discrepancy of rain gauges density in different basins. Finally, IMERG performed better in humid regions (SEB, PRB and YARB downstream) of China, which verifies the utility of IMERG V06 product as a source of precipitation data over humid regions; it can be used to near-real-time application such as flood simulation and monitoring.

Conclusions
This study provides a comprehensive evaluation of daily precipitation from different runs (early, late and final) of the latest version (V06) of IMERG against 696 key synoptic stations from 1 January 2008 to 31 December 2017 across China. We analyzed the accuracy of the IMERG products at various spatial and temporal scales through various performance metrics. Furthermore, we identified the effects of elevation on the accuracy of the estimation of precipitation from the IMERG products. The main conclusions of this study are summarized as follows: (1) All runs of the IMERG products can accurately capture the spatial patterns of daily precipitation from 2008 to 2017. However, the performances of the products vary among the river basins and gradually decrease from the southeast to the northwest of China. Better performance is measured in eastern humid basins compared to western arid basins. (2) Our analysis does not show significant differences between the early and late runs of IMERG products in China. However, moderate improvement is observed in the final run, as indicated by higher CC and KGE and lower RMSE at both daily and monthly levels of analysis. The three runs of IMERG show similar accuracy in estimating precipitation in China, with CSI values ranging from 0.4 to 0.41. (3) Our evaluation reveals a significant (p < 0.01) association between the performance of IMERG products and elevation, mainly highlighted by the analysis based on continuous performance metrics. For all runs, the accuracy gradually decreases with an increase in elevation. However, the categorical metrics exhibit lower levels of dependence on elevation except for POD. (4) In China and in each basin, all SPPs underestimate the frequency of no/tiny rain events (P < 0.1 mm/day) but overestimate the frequency of light rain events (0.1 ≤ P < 10 mm/day). The IMERG products better match the ground observations in areas with frequent moderate rain events (10 ≤ P < 25 mm/day). IMERG_F tends to overestimate the frequency of heavy precipitation (10 ≤ P < 25 mm/day) in southern China. All products align with ground-based observation in areas that frequently encounter rainstorms (P ≥ 50 mm/day) in PRB, YARB and SEB.