An Improved Assessment Method and Its Application to the Latest IMERG Rainfall Product in Mainland China

: Quantiﬁcation of uncertainties associated with satellite precipitation products is a prior requirement for their better applications in earth science studies. An improved scheme is developed in this study to decompose mean bias error (MBE) and mean square error (MSE) into three components, i.e., MBE and MSE associated hits, missed precipitation, and false alarms, respectively, which are weighted by their relative frequencies of occurrence (RFO). The trend of total MBE or MSE is then naturally decomposed into six components according to the chain rule for derivatives. Quantitative estimation of individual contributions to total MBE and MSE is ﬁnally derived. The method is applied to validation of Integrated MultisatellitE Retrievals for GPM (IMERG) in Mainland China. MBE associated with false alarms is an important driver for total MBE, while MSE associated with hits accounts for more than 85% of MSE, except in inland semi-arid area. The RFO of false alarms increases, whereas the RFO of missed precipitation decreases. Both factors lead in part to a growing trend for total MBE. Detection of precipitation should be improved in the IMERG algorithm. More speciﬁcally, the priority should be to reduce false alarms.


Introduction
Satellite observation is the most important measure to retrieve precipitation at a global scale, which provides a growing legacy of high-quality precipitation data for earth system sciences [1][2][3]. In particular, the launches of the Tropical Rainfall Measurement Mission satellite (TRMM) in 1997 and the Global Precipitation Measurement (GPM) core satellite in 2004 are two milestones in this aspect. Precipitation Radar (PR) onboard TRMM is the first spaceborne instrument to actively measure precipitation profiles. The dual-frequency precipitation radar (DPR) on the GPM core satellite improves the detection capability of light rainfall (0.2 mm h −1 ) and snowfall. It also expands coverage over latitude band 65 • N-65 • S [4]. These active instruments provide not only new insights into precipitation but also a vital source for calibrating retrievals from other precipitation sensors, e.g., passive microwave (PMW) sensors. PMW is sensitive to atmospheric column precipitation particles, whereas infrared (IR) is only sensitive to the cloud top layer [5]. More reliable rainfall estimations are then expected from PMW relative to IR. The TRMM microwave imager (TMI) and GPM microwave imager (GPI) provide a good reference for the cross-calibration of PWW imagers and sounders onboard other platforms, which spans the temporal and spatial coverage of PMW precipitation retrievals [5,6]. PMW observation has wider space coverage relative to PR and DPR. Specifically, a synergy of multiple PMW sensors on board different polar-orbiting satellites can effectively improve the detection efficiency, although it is less than that of IR radiometers onboard geostationary satellites. Most global precipitation products are mainly produced by PMW measurements but utilize IR data for spatiotemporal completeness, such as TMPA (TRMM Multisatellite Precipitation Analysis) [7] and IMERG (Integrated MultisatellitE Retrievals for GPM) [5].
There are dozens of high-resolution satellite precipitation products, among which IMERG is the latest. The highest temporal and spatial resolutions of IMERG are half an hour and 10 km, respectively [6]. The IMERG algorithm incorporated multiyear experience from international partners in its PMW calibration technique [8], IR [9] and PMW [10] precipitation retrievals, as well as PMW and IR morphing, and the Kalman filter algorithm [11,12]. It has been widely used to improve the accuracy of climate modeling in terms of extreme rain and snowfall to strengthen the applications for current and future disasters, disease, resource management, energy production, and food security [6].
Precipitation is a complex phenomenon showing dramatic temporal and spatial variations, which makes it a challenging task to estimate it accurately. Mathematically, the estimation is an ill-posed inversion problem. Satellite precipitation products are unavoidably associated with substantial large systematic and random errors, which vary with regions, climate, and precipitation types [10]. Therefore, it is necessary to evaluate the retrievals at multitemporal and spatial scales. This can not only provide suggestions for data users to select appropriate products, but also on how to apply and consider the impact of data quality in actual applications. More importantly, it can recommend the direction to improve the retrievals.
Many validation studies have been reported, which concern verification methods and demonstrate the performance of multiple satellite precipitation products at different temporal and spatial scales [13][14][15][16][17][18]. It is widely suggested that IMERG V06 outperforms other satellite rainfall products as a result of several major improvements [17]. The performance of the IMERG rainfall product depends on climate and topography. Underestimation of rainfall by IMERG was found in subtropical regions; for example, in the Indian subcontinent and Ganges Brahmaputra Meghna basin [19,20]. Although IMERG outperforms the reanalysis rainfall products in many aspects, the triple collocation analysis implies that IMERG snowfall needs further development [17], which may account for the fact poor performance of IMERG over north China [21]. On the contrary, good performance is generally found in the warm season, although systematic and random biases should still be adjusted before driving hydrological models [22][23][24]. Satellite-based precipitation retrieval techniques perform poorly in complex terrain, because low-level, topographically forced updrafts may trigger heavy rainfall that shows great spatiotemporal variability [23], which is also not captured by poor coverage of gauge measurements.
Uncertainties mainly come from three distinct situations, namely, hit (H): both gauged and satellite rainfall exceed a threshold; missed (M): satellite rainfall is below the threshold while gauged rainfall passes the threshold; false alarm (F): satellite rainfall exceeds threshold but gauged rainfall does not. Uncertainty associated with corrective negatives (C): both satellite and gauged rainfall are below the threshold, or are negligible. Knowledge of dominant contribution to total uncertainty is essential for the algorithm developers. Tian et al. (2009) proposed a novel error decomposition scheme of mean bias error (MBE), pointing out that evaluating total MBE may lead to misunderstanding. This is because missed precipitation is, to a certain extent, offset by false alarms, likely resulting in a negligible overall error. Assessment of long-term performance of satellite precipitation products is vital for climate study, which is addressed by a very few precious studies, for example, Tang et al. (2020) [17].
The purpose of this study is to further the decomposition scheme but focusing on the decomposition of MBE and mean square error (MSE) simultaneously. The outstanding feature of this improved method is to produce quantitative relationships between total MBE/MSE and their individual contributions. More importantly, long-term changes in MBE (MSE) associated with H, M, and F, as well as their relative frequencies of occurrence (RFO), are investigated. The contributions of these conditional errors to the unconditional Remote Sens. 2021, 13, 5107 3 of 13 errors are quantified and compared. The method is applied to the IMERG assessment in Mainland China. Interesting spatial and temporal distributions of MBE and MSE decomposition are presented. Spatial and temporal variability of IMERG performance in mainland China is presented. Potential improvements in IMERG during the past two decades are discussed.

Data
The latest IMERG precipitation product is derived by the V06 algorithm, which includes several major improvements. Previous studies showed that the IMERG product exhibited better performance than other satellite precipitation datasets, which were mostly based on short-term observations (after 2014 when the GPM Core Observatory was launched) [17]. Here we evaluate daily IMERG V06 rainfall covering the period beginning from June 2000 since the retrospective processing of PMW data before GPM Core Observatory was finalized. Daily rainfall, the accumulated rainfall from 00:00 to 24:00 UTC with half-hourly resolution, is validated.
We use daily rain-gauge data from 2395 stations over Mainland China as the reference. Note that monthly rainfall data at some of these stations are used as the reference to calibrate the IMERG V06 product. Therefore, strictly speaking, this is not an independent validation. The gauged data are quality controlled by the Chinese Meteorological Administration, which include gross error checks, time stamp checks, etc. The data are widely used for the validation studies [17,[19][20][21]23]. Given that the ultimate goal of our research is to drive our hydrological model by using the IMERG product after we carefully evaluate its performance, the validation is made in nine watersheds, separately. Figure 1 shows the gauge map with different colors in these watersheds. The Songliao watershed (SL) is located in northeastern China, where the climate is cold-zone-dry in winter and warm in summer. The Haihe watershed (HH) covers the north China plain, which has a subtropical monsoon climate. Yellow River watershed (YR) has a semi-arid climate. The climate in the inland watershed (IL) is arid-zone cold desert according to the Koppen-Geiger climate classification map. The climate in the Huaihe watershed (HU) is typically a temperate, warm summer and dry winter. The climate in the western part of the Yangtze river watershed (YZ) is temperate-zone dry in the warm season, whereas there is a temperate-zone warm summer and no dry season in the eastern part. The climate in the southeast watershed (SE) and the Pearl delta watershed (PD) is typically temperate-zone warm summer and no dry season. Temperate-zone dry, warm summer is typical in the southwestern watershed (SW).

Statistical Metrics
We used a simple collocation method to combine the gauge and IMERG grid in the validation. Measurements at gauges with a IMERG grid are averaged to compare IMERG pixel rainfall. Given that daily IMERG is calculated from half-hourly retrievals from 00:00 to 24:00 UTC, we used the daily gauged measurements from 08:00 to next 08:00 Beijing

Statistical Metrics
We used a simple collocation method to combine the gauge and IMERG grid in the validation. Measurements at gauges with a IMERG grid are averaged to compare IMERG pixel rainfall. Given that daily IMERG is calculated from half-hourly retrievals from 00:00 to 24:00 UTC, we used the daily gauged measurements from 08:00 to next 08:00 Beijing Time that agrees with that of IMERG, since UTC is 8 h behind Beijing Time. In order to evaluate the IMERG performance, the statistical merits were calculated based on the collocated twenty years (2001-2020) of gauged and IMERG rainfall products. The metrics include the Pearson correlation coefficient (CC) to describe the covariance between IMERG and gauged data. The mean bias error (MBE) and root mean square error (RMSE) are used to describe the systematic and random error characteristics of IMERG. We also used the Kling-Gupta efficiency [24,25] that combines the contributions of correlation, bias, and variability term.
where β is the bias ratio, i.e., the ratio of the mean of IMERG to that of the gauged data; γ is the ratio of the coefficient of variation between IMERG and gauged data.
Since missing and false alarm by the IMERG are two critical metrics demonstrating its performance. The probability of detection (POD) and false alarm ratio (FAR) are also calculated based on the number of hit events (H), the number of missed events (M), and the number of false alarms (F). Finally, the critical success index (CSI = H/(H + M + F)) is also calculated to demonstrate the capability of IMERG to detect precipitation occurrence.

Decomposition Scheme
The unconditional (total) MBE T is calculated from the equation below.
where Y i and X i represent daily satellite and gauged rainfall rate (mm d −1 ), respectively. N T represents annual total collocated samples of satellite and gauged rainfall rate. It is natural to decompose MBE T into three items as follows.
where N H , N M and N F represent annual events associated with H, M, and F, respectively. The three items in the bracket of the right-hand side of Equation (3) Linear trends of all items in Equation (5) are calculated by the least-square regression. This suggests that the MBE T trend is attributable not only to changes in annual MBEs Remote Sens. 2021, 13, 5107 5 of 13 associated with H, M, and F, but also changes in their RFOs. It is straightforward to discriminate between the contributions of individual components to the MBE T trend using Equation (5). The decomposition scheme is also applicable to unconditional MSE (MSE T ). The investigation of the error sources is then achieved by this decomposition, the results derived from which are presented in the following section. The threshold for the occurrence of rainfall is 0.5 mm d −1 , which accounts for the measurement uncertainties of satellite and gauged rainfall products. Note that the method and findings here are weakly dependent on the threshold.

Results
Figure 2 presents spatial distribution of four statistical metrics describing the precipitation detection capacity of the IMERG product. POD in the IL watershed is generally smaller than that in other watersheds. Another interesting feature is that POD in the Yungui Plateau is relatively smaller than the surrounding regions, which likely implies that PMW sensor nighttime precipitation should be improved in the algorithm. In addition, this is also likely to be associated with large uncertainties of satellite remote-sensing precipitation over the complex terrain [26][27][28]. Furthermore, FAR in the IL watershed is larger than that in other watersheds, which suggests that much effort should be paid to the improvement of precipitation retrieval over semi-arid regions [23]. Table 1 presents the statistical metrics derived from seasonal pairs of IMERG and gauged data in nine catchments. Several features merit mention. First, the performance of IMERG shows remarkable spatial and temporal variability. Second, seasonal mean IMERG rainfall tends to be smaller than gauged measurements, especially in the summer rainy season, which likely indicates that IMERG underestimates convective rainfall. Third, precipitation occurrence detection described by POD, FAR, and CSI are highly dependent on rainfall amount. Precipitation occurrence detection is better in rainy seasons or humid regions than that in dry seasons or semi-arid regions. On the contrary, the KGE that combines the contributions of correlation, bias, and variability term shows smaller variation between seasons and regions than POD, FAR, and CSI.     Figure 3. Table 2 presents their spatial mean biases and one standard deviations in nine watersheds.  (Figure 3a) is generally not further away from zero at most stations in northern China (SL, HU, IL, YR, and HU), while in southern China (YZ, SW, SE, and PD), a few stations evidence relatively large MBE T exceeding 1.5 mm d −1 . Regional mean MBE T are within 0.10 to approximately 0.23 mm d −1 in all watersheds except in IL (inland semi-arid area), where it is close to zero ( Table 2). Positive MBE T values imply that IMERG tends to overestimate precipitation in general from the statistical view. Weighted MBE H (by its RFO) is generally smaller than MBE T , while a similar spatial distribution is observed (correlation coefficient between them is 0.88). Small weighted MBE H values are derived in northern China, varying from −0.05 to 0 mm d −1 ; while in southern China, weighted MBE H is relatively large and occasionally comparable to MBE T , indicating a considerable contribution to MBE T . For example, weighted MBE H in PD reaches 0.13, which accounts for 81% of MBE T . Weighted MBE F is an important contributor to MBE T in all watersheds ( Table 2), while it is partly offset by weighted MBE M values that is relatively smaller in magnitude. The reason for larger weighted MBE F is mostly due to more occurrence of false alarms than missed precipitations. This is more evident in the SL watershed where the RFO associated with missed precipitation are below 5%, while it exceeds 8% for false alarms. Poor precipitation occurrence detection there is partly because detection of snowfall by the PMW remote sensing is still a big challenge. Therefore, it is essential to improve the detection accuracy of IMERG, especially to reduce the probability of false alarms.  The spatial distribution of MSET (Figure 4a) generally follows that of a rainfall rate that decreases from north to south and from west to east. As shown in Table 3, the minimum regional mean MSET is observed in IL, being 4.4 mm d −1 , which is smaller by a factor of 100 than the maximum in PD (104.9 mm d −1 ). The reason for this phenomenon is that the precipitation intensity and RFO in PD, the humid region, are far larger than that in IL, which is located in the semi-arid region. The decomposition of MSET suggests that it is overwhelmingly contributed to (>85%) by weighted MSEH in all watersheds, even in IL where the smallest weighted MSEH can even account for 68% of MSET. Weighted MSEM is  The spatial distribution of MSE T (Figure 4a) generally follows that of a rainfall rate that decreases from north to south and from west to east. As shown in Table 3, the minimum regional mean MSE T is observed in IL, being 4.4 mm d −1 , which is smaller by a factor of 100 than the maximum in PD (104.9 mm d −1 ). The reason for this phenomenon is that the precipitation intensity and RFO in PD, the humid region, are far larger than that in IL, which is located in the semi-arid region. The decomposition of MSE T suggests that it is overwhelmingly contributed to (>85%) by weighted MSE H in all watersheds, even in IL where the smallest weighted MSE H can even account for 68% of MSE T . Weighted MSE M is also smaller than weighted MSE F , which is mostly because of larger RFO associated with false alarms relative to missed precipitation. Furthermore, occasional false alarms of relatively large IMERG rainfall rates also contribute to a larger MSE F . This feature is more outstanding in HU and HH where MSF F is more than twice larger than MSE M . Weighted MSE M and MSE F in IL contributed to MSE T by 14% and 18%, respectively, indicating that there is much room to improve IMERG rainfall detection in this inland semi-arid region. Specifically, detection of light rainfall events should be improved and false alarms of relatively larger rainfall events should be avoided as far as possible.  Figure 5 presents the long-term trend of MBET in nine watersheds, which is composed of six components according to Equation (4). All nine watersheds saw an increase in MBET during the past twenty years, which suggests a slight deterioration of the IMERG product from this point of view. The growing trend of MBET is generally supported by the trends of MBEH, MBEM, and MBEF except in SW and PD. MBEM has shown a relatively stable trend in the past 20 years. MBEF decreased from 4.2 to 3.4 in PD and decreased from 3.8 to 3.4 in SE, which would support a decreased MSET to some extent. With the help of Equation (5), we can see that the MBET trend is not only determined by trends in MBEH, MBEM, and MBEF that are weighted by their mean RFOs, but it is also attributable to  Figure 5 presents the long-term trend of MBE T in nine watersheds, which is composed of six components according to Equation (4). All nine watersheds saw an increase in MBE T during the past twenty years, which suggests a slight deterioration of the IMERG product from this point of view. The growing trend of MBE T is generally supported by the trends of MBE H , MBE M , and MBE F except in SW and PD. MBE M has shown a relatively stable trend in the past 20 years. MBE F decreased from 4.2 to 3.4 in PD and decreased from 3.8 to 3.4 in SE, which would support a decreased MSE T to some extent. With the help of Equation (5), we can see that the MBE T trend is not only determined by trends in MBE H , MBE M , and MBE F that are weighted by their mean RFOs, but it is also attributable to changes in the RFOs associated with H, M, and F that are weighted by their average MBE values. The major feature of Figure 6, the time series of RFOs associated with H, M, F, and C, is that IMERG tends to miss fewer precipitation events but produces more false alarms. This feature can be found in nine watersheds but is most prominent in PD, where RFO F increases from about 7% to 13%, whereas RFO M decreases from about 10% to 8%. Both would be expected to produce a growing MBE T as suggested by Equation (4). Regarding MSE T , the trends in nine watersheds are marginal (Figure 7). Although MSE H , MSE M , and MSE F have improved more or less in SE, SW, and PD (especially in PD, MSE H , MSE M and MSE F has decreased by a quarter in the past 20 years), the MSE T trend is still negligible. This is because these improvements are partially or fully offset by the negative effect of the large increasing tendency of false alarms on the MSE T trend.
increases from about 7% to 13%, whereas RFOM decreases from about 10% to 8%. Both would be expected to produce a growing MBET as suggested by Equation (4). Regarding MSET, the trends in nine watersheds are marginal (Figure 7). Although MSEH, MSEM, and MSEF have improved more or less in SE, SW, and PD (especially in PD, MSEH, MSEM and MSEF has decreased by a quarter in the past 20 years), the MSET trend is still negligible. This is because these improvements are partially or fully offset by the negative effect of the large increasing tendency of false alarms on the MSET trend.   would be expected to produce a growing MBET as suggested by Equation (4). Regarding MSET, the trends in nine watersheds are marginal (Figure 7). Although MSEH, MSEM, and MSEF have improved more or less in SE, SW, and PD (especially in PD, MSEH, MSEM and MSEF has decreased by a quarter in the past 20 years), the MSET trend is still negligible. This is because these improvements are partially or fully offset by the negative effect of the large increasing tendency of false alarms on the MSET trend.

Discussion
Satellite remote-sensing precipitation is still a challenging task even though much progress has been made during recent decades. With the release of new satellite precipi-

Discussion
Satellite remote-sensing precipitation is still a challenging task even though much progress has been made during recent decades. With the release of new satellite precipitation products, evaluation of the quality of these new products is still urgently required. The fact that most satellite precipitation products provide more than two decades' data is worthy of record. Detailed assessment of long-term performance of satellite precipitation products is critical for its usage in sciences and applications, which has been overlooked in most previous validation studies. Validation must go beyond reporting the overall error metrics. In this study, we use rainfall measurements at 2395 gauge in mainland China to evaluate the latest IMERG rainfall product during the last two decades. To that end, time-series features are used here to study the correspondence between the satellitederived and measured precipitation transients. We developed an improved validation method that is capable of decomposing MBE and MSE into independent components. A simple but effective method to decompose the trends of MBE and MSE to six individual components are presented, which would be expected to reveal detection capability and retrieval uncertainties, as well as their quantitative contributions to total MBE and MSE. It was suggested that the accuracy of IMERG precipitation retrievals increases over the time period from 2001 to 2018 by using such metrics as CC, CSI, and KGE. This is attributed to more PMW sensors, higher resolutions and more frequency channels [17]. However, it is shown here that the false alarm ratio increases, whereas missing rainfall rate decreases, which leads to an increasing MBE of IMERG precipitation. There are dozens of statistical metrics used in the validation, but no consensus is reached in the community with regard to how many statistical metrics should be included. Generally, these metrics may suffer from three weaknesses, i.e., interdependence, underdetermination, and incompleteness [15]. It is suggested that the error characteristics can be fully described by the joint probability distribution, namely, p(Y,X), which may provide more insight of satellite precipitation retrieval uncertainties. Gauged measurements are always taken as the reference; however, it should be kept in mind that they also suffer from many measurement uncertainties. Furthermore, the spatial scale mismatch between satellite gridded precipitation products and in situ gauged measurements is perhaps the least-understood topic, which needs further discussion. Dense gauge networks are absolutely required for inquiring into the spatial representativeness of gauge stations, which has been shown by very few previous studies [18]. Given the fact that satellite and gauge-based rainfall products are complementary, it is of significance that they may be blended together to produce a more accurate product, a prerequisite for which is certainly to fully understand their uncertainties via comprehensive validation [29,30].

Conclusions
Based on the collocated daily IMERG and gauged data during 2000-2020, a comprehensive evaluation of IMERG rainfall products has been performed. A decomposition scheme was developed to separate total MBE and MSE to their components, i.e., the corresponding values associated with hit, missed, and false alarms. The trends of MBE and MSE are partly accounted for by weighted trends in MBEs associated with hit, missed, and false alarms that are weighted by their relative occurrence; furthermore, trends in the relative occurrence frequency of hit, missed, and false alarms weighted by their average MBE or MSE values are also important drivers for changes in total MBE and MSE. Major conclusions are as follows.
IMERG tends to underestimate multiannual mean precipitation in mainland China, which is more evident in rainy seasons. The precipitation-occurrence-detection capability of IMERG in rainy seasons and humid regions is better than that in dry seasons and semi-arid regions.
Weighted MBE F is larger than weighted MBE M in magnitude partly as a result of more occurrence of false alarms than missed precipitations. Total MSE is dominantly contributed by weighted hit MSE.
A downward tendency of missed precipitation rate was observed in nine watersheds, while the false-alarm rate increased, thereby leading to a growing trend of total MBE. Improvement in IMERG rainfall detection still needs improvement, especially in reducing the impact of false alarms of precipitation.