Validation of Satellite-Based Precipitation Products from TRMM to GPM

: The global precipitation measurement mission (GPM) has been in operation for seven years and continues to provide a vast quantity of global precipitation data at ﬁner temporospatial resolutions with improved accuracy and coverage. GPM’s signature algorithm, the integrated multisatellite retrievals for GPM (IMERG) is a next-generation of precipitation product expected for wide variety of research and operational applications. This study evaluates the latest version (V06B) of IMERG and its predecessor, the tropical rainfall measuring mission (TRMM) multisatellite precipitation (TMPA) 3B42 (V7) using ground-based and gauge-corrected multiradar multisensor system (MRMS) precipitation products over the conterminous United States (CONUS). The spatial distributions of all products are analyzed. The error characteristics are further examined for 3B42 and IMERG in winter and summer by an error decomposition approach, which partitions total bias into hit bias, biases due to missed precipitation and false precipitation. The volumetric and categorical statistical metrics are used to quantitatively evaluate the performance of the two satellite-based products. All products show a similar precipitation climatology with some regional differences. The two satellite-based products perform better in the eastern CONUS than in the mountainous Western CONUS. The evaluation demonstrates the clear improvement in IMERG precipitation product in comparison with its predecessor 3B42, especially in reducing missed precipitation in winter and summer, and hit bias in winter, resulting in better performance in capturing lighter and heavier precipitation.


Introduction
Reliable precipitation data are critical for a wide variety of applications such as water budget studies and prevention or mitigation of natural hazards caused by extreme precipitation events. Precise precipitation measurements are always a challenge because of its large spatiotemporal variability and inherent errors of various measuring instruments. Traditional rain gauges provide direct rainwater measurements, and often serve as the reference for validation of radar-and satellite-based precipitation products [1][2][3][4][5]. However, gauges can only make what are essentially point-measurements at a specific site. The areal distribution of rain gauge networks is usually sparse, irregular, incomplete, and therefore insufficient for accurately describing the spatial variability of precipitation [1,3]. Groundbased weather radars estimate precipitation from reflectivity measurements over relatively large areas. The implementation and retrofitting of dual-polarization to many new weather radars leads to a more accurate precipitation estimation [6,7]. However, the radar networks are mostly deployed over continents only, not dense enough over most parts of the world over land and provide little coverage over oceans. In addition, the radars suffer from various sources of problems, such as beam blockage in mountainous regions [1,5]. In contrast, remote sensors from Earth observation satellites have been utilized to address the 2. Data 2.1. Study Domain Figure 1a shows the study domain with land and ocean mask. We limited the study domain to land only because of a lack of rain gauges and radar coverages over the ocean, dwindlement of radar data utility with range along the coastal oceans (Figure 2), and the significant errors in satellite retrievals due to beam filling effects of part land and part water. A land/ocean mask (Figure 1a) at 0.25 • resolution is constructed from the GPM Microwave Imager footprint surface type data over the latitude/longitude box region enclosed by (125-65 • W, 25-50 • N). Here we define "land" as lands, lakes, rivers and other inland waters. All others are defined as "Ocean" and excluded in the analysis. significant errors in satellite retrievals due to beam filling effects of part land and part water. A land/ocean mask (Figure 1a) at 0.25° resolution is constructed from the GPM Microwave Imager footprint surface type data over the latitude/longitude box region enclosed by (125° W-65° W, 25° N-50° N). Here we define "land" as lands, lakes, rivers and other inland waters. All others are defined as "Ocean" and excluded in the analysis.
Climatologically, precipitation features in CONUS vary with latitude and topography. Generally, it is wetter in the southeast and along the West Coast, but drier in the north and west. Based on Köppen-Geither climate classification, CONUS is typically classified into five main climate regions: A-tropical climates, B-dry climates, C-moist subtropical mid-latitude climates, D-moist continental mid-latitude climates, and Hhighlands (Figure 1b).

TRMM 3B42
TMPA is a 3-hourly precipitation product with a spatial resolution of 0.25° in latitude and longitude, covering from 50° S to 50° N [9]. The product record started from 1 January 1998 when TRMM data became available, and for continuity, continued into the GPM era even after TRMM terminated, and ended on 31 December 2019 when IMERG product was regarded as its substitute [27]. TRMM 3B42 combines measurements from different platforms onboard multiple satellites. The passive microwave precipitation estimates are intercalibrated to the TRMM combined instrument (TCI) product, and finally adjusted with rain gauge data on a monthly basis. The intercalibration was changed from TCI to a climatological calibration starting with the data from October 2014 because of the end of the TRMM satellite mission. This created at least a slight inhomogeneity, primarily over the oceans [27]. The inhomogeneity over land should be minimized by the gauge adjustment [28].

Montly GPCC
The Global Precipitation Climatology Centre (GPCC) provides gridded precipitation analyses derived from quality-controlled gauges worldwide via the Global Telecommunication System of the World Meteorological Organization [29]. One of the GPCC products is the full data monthly analysis version 2020, which is developed from up to 54,000 gauges using the new GPCC precipitation climatology version 2020 as the analysis background. The GPCC monthly gauge analysis is one of the contributing data sources for TMPA [9] and IMERG [10]. The full data monthly analysis version 2020 at 0.25° spatial resolution is used in this study to confirm the consistency of satellite-based precipitation products and check the MRMS reliability at the monthly scale.

Methodology and Evaluation Metrics
Due to the difference in spatial and temporal resolutions among the above-described products, the validation is conducted using an approach that matches and temporally resamples MRMS, 3B42, and IMERG data to a common spatiotemporal resolution of 0.25° and 3 h. Note that, for each day, the 3-h cutoff times for 3B42 files are 0130, 0430, ..., 2230UTC. The first 3 h of a day have a starting time of 2230UTC on the day before, and the last 3 h have the ending time of 2230UTC on the data day. Accordingly, we accumulated 30-min data for MRMS and IMERG to the same 3-h interval as 3B42.
Probability density function (PDF) for precipitation occurrence and volume, scatter density plots, time series of daily and instantaneous 3-hourly precipitation maps are routinely produced (https://wallops- Climatologically, precipitation features in CONUS vary with latitude and topography. Generally, it is wetter in the southeast and along the West Coast, but drier in the north and west. Based on Köppen-Geither climate classification, CONUS is typically classified into five main climate regions: A-tropical climates, B-dry climates, C-moist subtropical mid-latitude climates, D-moist continental mid-latitude climates, and H-highlands (Figure 1b).

TRMM 3B42
TMPA is a 3-hourly precipitation product with a spatial resolution of 0.25 • in latitude and longitude, covering from 50 • S to 50 • N [9]. The product record started from 1 January 1998 when TRMM data became available, and for continuity, continued into the GPM era even after TRMM terminated, and ended on 31 December 2019 when IMERG product was regarded as its substitute [27]. TRMM 3B42 combines measurements from different platforms onboard multiple satellites. The passive microwave precipitation estimates are intercalibrated to the TRMM combined instrument (TCI) product, and finally adjusted with rain gauge data on a monthly basis. The intercalibration was changed from TCI to a climatological calibration starting with the data from October 2014 because of the end of the TRMM satellite mission. This created at least a slight inhomogeneity, primarily over the oceans [27]. The inhomogeneity over land should be minimized by the gauge adjustment [28].

GPM IMERG
As a global successor to TMPA, the newly released version 06B IMERG is a multisatellite precipitation product at fine spatial (0.1 • ) and temporal (0.5-h) resolution with global coverage. The National Aeronautics and Space Administration (NASA) precipitation processing system (PPS) generates IMERG by combining passive microwave (PMW) and infrared (IR) observations from the GPM constellation satellites and calibrated with gauge analysis of the Global Precipitation Climatology Centre (GPCC) [29]. IMERG is a unified U.S. national algorithm [10], which combines the strengths of several products including: (1) NASA TMPA for intersatellite calibration and gauge adjustment [9], (2) the National Oceanic and Atmospheric Administration (NOAA) Climate Prediction Center (CPC) morphing technique with Kalman filter (CMORPH-KF) for time interpolation of PMW-based estimates [30], and (3) the University of California (Irvine) precipitation estimation from remotely sensed information using artificial neural networks-cloud classification system (PERSIANN-CCS) for retrieval of microwave calibrated IR estimates [31]. One key difference between 3B42 and IMERG is that CMORPH-KF is applied in IMERG to minimize the use of low quality IR estimates.
Three types of IMERG products are provided for different user requirements and applications: near-real time "Early" run (IMERG-E), "Late" run (IMERG-L), and postreal time "Final" run (IMERG-F) with approximate latency of 4 h, 12 h, and 4 months, respectively. The monthly gauge adjustment is used in IMERG-F to reduce bias. IMERG-F is a research quality product and generally considered to be more accurate than IMERG-E and L. Here the latest version (V06B) IMERG-F is evaluated. The details of the algorithm are described by Huffman et al. [10], and the latest changes to the morphing algorithm in V06B are further provided by Tan et al. [32]. IMERG data files can be accessed from https://gpm.nasa.gov/data/directory (accessed on 29 April 2021).

Ground MRMS
To evaluate the quality of IMERG and 3B42 products, the "Level 3" MRMS is selected as a ground-based reference. Level 3 denotes a gridded product in the GPM project. The NOAA's National Severe Storms Laboratory (NSSL) integrates information from about 180 ground-based operational radars and generates a 3D radar mosaic across the CONUS and Southern Canada (130-60 • W, 20-55 • N) [33]. The Level 3 MRMS products used in this study were at half-hourly and 0.01 • resolutions, generated through significant MRMS data postprocessing, which involves gauge-based bias adjustments, resampling, and quality controls, specifically for support of GPM GV [34]. A radar quality index (RQI) is also produced along with precipitation estimates in the MRMS suite. RQI ranging from 0 (worst) to 100 (best) indicates sampling and estimation uncertainty, depending on the height of the lowest radar elevation angle relative to the bright band and percentage of beam blockage by topography [35]. Figure 2 shows mean RQI maps averaged from halfhourly data for three winters and five summers, respectively, during the 55-month period (June 2014 to December 2015, June 2016 to July 2017, and March 2018 to December 2019). Data from January to May 2016 and August 2017 to February 2018 were excluded, which will be discussed in Section 4.1. Figure 2 clearly indicates that the MRMS data quality in summer (JJA) was better than in winter (DJF). In addition, the gaps of radar coverages did exist in western CONUS even in summer. Further, compared to eastern CONUS, the MRMS quality was evidently lower in the Intermountain West because of severe blockages caused by the complex terrain. RQIs in coastal oceanic areas were generally very low due to large distances between neighboring radars and sharp topographic transition. The coastal oceanic areas were masked out from the study (Figure 1).

Montly GPCC
The Global Precipitation Climatology Centre (GPCC) provides gridded precipitation analyses derived from quality-controlled gauges worldwide via the Global Telecommunication System of the World Meteorological Organization [29]. One of the GPCC products is the full data monthly analysis version 2020, which is developed from up to 54,000 gauges using the new GPCC precipitation climatology version 2020 as the analysis background. The GPCC monthly gauge analysis is one of the contributing data sources for TMPA [9] and IMERG [10]. The full data monthly analysis version 2020 at 0.25 • spatial resolution is used in this study to confirm the consistency of satellite-based precipitation products and check the MRMS reliability at the monthly scale.

Methodology and Evaluation Metrics
Due to the difference in spatial and temporal resolutions among the above-described products, the validation is conducted using an approach that matches and temporally resamples MRMS, 3B42, and IMERG data to a common spatiotemporal resolution of 0.25 • and 3 h. Note that, for each day, the 3-h cutoff times for 3B42 files are 0130, 0430, ..., 2230UTC. The first 3 h of a day have a starting time of 2230UTC on the day before, and the last 3 h have the ending time of 2230UTC on the data day. Accordingly, we accumulated 30-min data for MRMS and IMERG to the same 3-h interval as 3B42.
Probability density function (PDF) for precipitation occurrence and volume, scatter density plots, time series of daily and instantaneous 3-hourly precipitation maps are routinely produced (https://wallops-prf.gsfc.nasa.gov/NMQ/IMERG_3B42_MRMS_30 m/index.php (accessed on 29 April 2021)). The statistical metrics such as relative bias (bias), Pearson correlation coefficient (Corr), and normalized mean absolute error (NMAE), normalized root-mean-square error (NRMSE) are utilized to quantify the performance of satellite products.
Relative bias shows an overestimation (positive) or underestimation (negative) of total precipitation. Corr ranging from −1 to 1, is an indicator of an a priori assumed linear relationship between two products. NRMSE is normalized root-mean-square-error (RMSE), denoting the relative mean error magnitude. By using NRMSE, the issue that RMSE values increase with the precipitation intensities can be mitigated. The equations of these metrics are not listed in this manuscript. Readers are referred to statistics books, e.g., [36] and peer-reviewed papers, e.g., [15].
In order to further examine precipitation detection capability of satellite-based products (IMERG and 3B42) against ground-based reference product (MRMS), a contingency table is constructed for each product. The contingency table, also called two-way frequency table, consists of occurrences of hit (h), miss (m), false alarm (f), and correct negative (c). For the satellite product to be evaluated, "hit" means that precipitation is correctly detected when it occurs, and "miss" means that precipitation is not detected when it occurs. "False alarm" means that precipitation is mistakenly detected, and "correct negative" means that zero-precipitation is correctly flagged as such.
From the hits, misses, false alarms, and correct negatives, three widely applied categorical statistical skill metrices: probability of detection (POD), false alarm ratio (FAR), and Heidke skill score (HSS) can be calculated. POD is the fraction of actual precipitation occurrences that are correctly detected; a perfect score is 1 and the worst is 0. The FAR measures the fraction of falsely detected precipitation occurrences, ranging from 0 (the perfect) to 1 (all detected precipitation rates are fake). HSS, a generalized skill score ranging from −∞ to 1, quantifies whether the estimate is equal to (0), worse (negative value), or better (positive value) than random chance relative to the reference; the perfect value is 1.
The categorical statistical skill metrics provide categorical measures of performance based on precipitation occurrences, but do not provide any information as to what fraction of the precipitation volume is detected. Therefore, a volumetric error decomposition approach developed by Tian et al. [37] and Habib et al. [38] is adapted to completely decompose the total error (E) of precipitation volume into three independent components: (1) hit bias (H)-precipitation is detected, but the precipitation volume is biased either positively or negatively; (2) bias due to missed precipitation (−M)-precipitation is not detected, which obviously leads to a negative bias; and (3) bias due to false precipitation detection (F)-precipitation is mistakenly detected while there is no precipitation, which apparently causes a positive bias. These three components are related by Equation (4) It is possible that H, −M, or F is larger than E simply because they could cancel one another, which results in a smaller E. Therefore, this approach can reveal much more information on error characteristics by looking at individual components, instead of merely the total error.
In the calculation of the above categorical metrics, and in the error decomposition approach, the precipitation and no-precipitation threshold are assumed to be 0.1 mm h −1 . This threshold has been applied in many previous studies (e.g., [13]). If the precipitation rate is below 0.1 mm h −1 , it is considered not precipitating. The selection of this threshold is rather subjective, but the results were relatively robust, which is demonstrated when other thresholds (0.0, 0.2, 0.5, and 1.0 mm h −1 ) were also selected for the test. The trace precipitation below the threshold of 0.1 mm h −1 was not very important in meteorological and hydrological applications. It only accounted for 1.5%, 0.3%, and 3.1% of total precipitation for MRMS, 3B42, and IMERG, respectively, which will be discussed in Section 4.4. The threshold of 0.1 mm h −1 was below the minimum detectable precipitation rates of remote sensors on ground radars and satellites. It is common to consider a threshold to eliminate spurious light precipitation caused by different instruments and retrieval algorithms [12,15]. , which will be studied in a winter/summer contrast. Figure 3 illustrates that the satellite-based monthly precipitation from TRMM 3B42 and GPM IMERG had a pattern similar to the observations from ground-based MRMS and GPCC products. All products were able to reasonably reproduce the precipitation seasonal variation. IMERG and 3B42 appeared to overestimate the winter precipitation. The overestimation is likely because IR precipitation estimation in satellite products was indirectly inferred from cloud top temperature, which limited the accuracy for the cold precipitation detection and quantification [17]. Product characteristics will be further analyzed in the next section.

Time Series of Spatially Averaged Monthly Precipitation
summers (June to August in 2014, 2015, 2016, 2018, and 2019), which will be studied in a winter/summer contrast. Figure 3 illustrates that the satellite-based monthly precipitation from TRMM 3B42 and GPM IMERG had a pattern similar to the observations from ground-based MRMS and GPCC products. All products were able to reasonably reproduce the precipitation seasonal variation. IMERG and 3B42 appeared to overestimate the winter precipitation. The overestimation is likely because IR precipitation estimation in satellite products was indirectly inferred from cloud top temperature, which limited the accuracy for the cold precipitation detection and quantification [17]. Product characteristics will be further analyzed in the next section.  Figure 4 is the time series of ratios and correlations for IMERG (3B42) relative to MRMS, calculated from 3-hourly precipitation at 0.25° resolution for each month. GPCC is not available for the comparison because it is a monthly product. Only pixels associated with the perfect MRMS RQI value (100) and precipitation at least 0.1 mm h −1 for both IMERG (3B42) and MRMS were used in the calculation. This qualitatively filtering procedure can effectively remove noise from MRMS data [39]. Figure 4 clearly demonstrates that IMERG constantly outperformed 3B42 in terms of ratio and correlation. Relative to MRMS, both products overestimated precipitation in winter whereas  Figure 4 is the time series of ratios and correlations for IMERG (3B42) relative to MRMS, calculated from 3-hourly precipitation at 0.25 • resolution for each month. GPCC is not available for the comparison because it is a monthly product. Only pixels associated with the perfect MRMS RQI value (100) and precipitation at least 0.1 mm h −1 for both IMERG (3B42) and MRMS were used in the calculation. This qualitatively filtering procedure can effectively remove noise from MRMS data [39]. Figure 4 clearly demonstrates that IMERG constantly outperformed 3B42 in terms of ratio and correlation. Relative to MRMS, both products overestimated precipitation in winter whereas underestimated precipitation in summer. The correlation between IMERG (3B42) and MRMS was higher in winter than in summer. Figure 4 is just one of the plots for the time series of monthly ratios and correlations from all combinations of RQIs (≥0, 25, 50, 75, and 100) and precipitation thresholds (≥0.0, 0.1, 0.2, and 0.5 mm/h) for 3B42 and IMERG. The complete distribution of the ratios and correlations can be displayed on boxplots with emphasis on the directions of bias (i.e., positive or negative) and outliers. In Figures 5 and 6, one boxplot corresponded to the 55 months from each of the 80 time series plots. For example, the data in the boxplot (last column in Figure 6f) for the correlation between IMERG and MRMS with RQI of 100 and precipitation threshold of 0.1 mm h −1 was from the solid line in Figure 4b.
From Figure 5b-d, the large positive bias in 3B42 was evident, with the interquartile range and mean all are well above 1. Large outliers contribute to considerable overestimation of precipitation rates.
All boxplots, except for Figures 5a and 6a, share a common featured that the ratio decreased towards 1 with increasing RQI. It is worth noting that the ratio for 3B42 relative to MRMS with a threshold of 0.0 mm h −1 (Figure 5a) was lower than all other ratios with larger precipitation thresholds (Figures 5b-d and 6b-d). This implies that 3B42 missed a considerable amount of total rain volume due to light precipitation. This will be shown in probability distribution of 3-hourly precipitation rates ( Figure 11). On the other end, 3B42 overestimated precipitation with intensities greater than 0.1 mm h −1 (Figure 5b-d). This The correlation coefficient was relatively stable with respect to the change of RQI. This is expected because the correlation quantified the covariability between two products. The best correlation was for the cases with a threshold of 0.0 mm h −1 (Figures 5e and 6e) because of large concurrences of zero-precipitation rates included.

Spatial Distribution of Mean Monthly Precipitation and Error Decomposition Analysis
The spatial distributions of mean monthly precipitation averaged from 55 months, three winters, and five summers for ground (GPCC and MRMS ) and satellite (3B42 and IMERG) derived products are shown in Figure 7. For the 55-month average, all products display a high degree of consistency in spatial distribution patterns with copious precipitation in the southeast and Pacific northwest, with relatively less precipitation in the large area of western CONUS. However, differences were still visually discernable among the four products. There were missing data over the Great Lakes in GPCC because of the lack of floating rain gauges. A gradual precipitation gradient descent from south and southeast to north and northwest in IMERG, 3B42 and GPCC was spatially smoother than that in MRMS. That is the smoothing effect in GPCC gauge interpolation [29] reflected in 3B42 and IMERG through the gauge adjustment. IMERG best reproduces the spatial variation of GPCC, which appears to be benefited from the IMERG's monthly calibration against GPCC gauges. IMERG has the best pattern correlation (0.963) with GPCC, followed by 3B42 (0.947) and MRMS (0.81).    overestimated precipitation with intensities greater than 0.1mm h −1 (Figure 5b-d). This issue was largely mitigated in Figure 6b-d for IMERG.
The correlation coefficient was relatively stable with respect to the change of RQI. This is expected because the correlation quantified the covariability between two products. The best correlation was for the cases with a threshold of 0.0 mm h −1 (Figures 5e  and 6e) because of large concurrences of zero-precipitation rates included.  (Figure 5c,g), and 0.5 mm h −1 (Figure 5d,h). The black box is the interquartile range extending from the lower quartile (25th) to the upper quartile (75th). The white bar inside the black box is the median, and the blue asterisk is the mean. The vertical line represents whiskers extending from the minimum to maximum. The upper whisker is truncated and listed at the upper border of each boxplot if the maximum is out of the plot area.

Spatial Distribution of Mean Monthly Precipitation and Error Decomposition Analysis
The spatial distributions of mean monthly precipitation averaged from 55 months, three winters, and five summers for ground (GPCC and MRMS ) and satellite (3B42 and IMERG) derived products are shown in Figure 7. For the 55-month average, all products display a high degree of consistency in spatial distribution patterns with copious  In winter, precipitation along the northwest coastal area is considerably underestimated by 3B42, mainly because of the difficulty in detecting warm, coastal rainfall. This is also reported by Tang et al. [40]. The underestimation is well corrected in IMERG.
Both IMERG and 3B42 show different levels of overestimation near the Great Lakes area, especially in summer, due to the lack of gauge calibration. The spatial characteristics of the error components for 3B42 and IMERG in winter and summer seasons can be further analyzed using the error decomposition approach [37,38].
As described in Section 3, we decomposed the total bias into hit, missed-precipitation, and false-precipitation biases for each product during each month at each 0.25 • grid in the study domain (Figure 1), and then accumulated monthly total biases and bias components as percentages of total MRMS precipitation into seasonal scales, particularly for winter and summer seasons. In this analysis, 3-winter-average and 5-summer-average are presented ( Figure 8). The spatial characteristics of the error components for the 55-month average is not shown here as it basically represents the summation of winter and summer in Figure 8a-d, the spatial distributions of the total bias and three error components with RQI filtering. Both 3B42 and IMERG shared considerable similarity in their spatial bias patterns, but with different levels of biases. One obvious feature in the total bias is that the superior performance of satellite products in the central and southern Plains. This may be resultant of the gauge calibration in 3B42 and IMERG algorithms. Both IMERG and 3B42 appear to underestimate the orographically-forced precipitation over the large areas of mountainous western CONUS in summer and winter, whereas they overestimate the precipitation along the Rockies, Sierra Nevada, Mexican border and Great Lakes areas where MRMS quality is questionable (Figure 2). The overestimation is mostly dominated by the enhanced false precipitation detection in these areas. The positive hit bias also contributed to the overestimation over some of these areas. However, after the RQI filter (RQI = 100) was applied, these biases were largely removed (Figure 8b,d).
For winter season, the missed-precipitation bias had the largest amplitude (Figure 9a) among all three components, especially for 3B42 at higher latitudes and mountainous areas (Figure 8a,b). Missed precipitation might be mostly due to the inability of multisatellite PMW sensors for snowfall measurements or rainfall observations over icy land surface [40,41]. Substantial improvements can be seen in the IMERG product where the magnitude of missed precipitation error was dramatically decreased. This can be the result of the use of morphing and PERSIANN-CSS in IMERG. Compared to 3B42, the hit and missed-precipitation biases in IMERG were greatly reduced, whereas the false-precipitation bias was slightly reduced, which was more clearly summarized in Figure 9. The missedprecipitation bias had the largest amplitude among all three components. These three components cancelled one another, which resulted in a smaller total bias. IMERG displays obvious improvement to 3B42 in missed-precipitation and hit bias reduction in winter. However, the hit bias in IMERG was over-reduced in summer, resulting in a large area of negative hit bias (Figure 8c,d).
For the summer season (Figure 8c,d), both products were noticeably similar with a slight underestimation in most of CONUS except for along the Mexican border and the Great Lakes region. The missed-precipitation bias is largely reduced compared to its winter counterpart. The total bias pattern appears similar to that of the hit bias (Figure 8c,d). The overall false precipitation and missed-precipitation biases were not as pronounced as in winter (Figures 8 and 9). This can likely be traced to the more effective gauge correction in 3B42 and IMERG algorithms, which benefits from the fact that the gauge measurement is more reliable for summer rainwater than for winter solid precipitation.
Note that the selection of RQI = 100 is a very conservative approach, which only retains the MRMS data with the best quality; as a result, it significantly limits the data availability especially in the Intermountain West where the quality of radar measurements is lower than other CONUS areas because of radar beam blockage by terrain. Hence, MRMS is not a good GV reference over mountainous regions as it might not correctly represent the weather systems in the regions [42].
We also selected other RQI thresholds for comparisons ( Figure 9). Figure 9 shows the averaged error components, and the total errors, as percentages of the total MRMS precipitation for all 3-hourly precipitation over 0.25 • grids with different RQI filtering for winter and summer, respectively.   Figure 10 shows statistical metrics for IMERG and 3B42 performances relative to MRMS with RQI filtering (zero or 100%) for winter and summer, respectively. IMERG clearly outperformed 3B42 in both winter and summer in terms of all metrics except for a slightly worse false rate, FAR, and relative bias in summer. The morphing algorithm improved the precipitation detection with better hit and miss rates, and increased POD and HSS scores in Figure 10a-d, but at the price of slightly increased false rate and FAR. The increased FAR might be an issue associated with PMW retrieval, which sometimes may erroneously treat the dynamic surface characteristics over land as precipitation signals [43].

Statistical Metrics and Categorical Skill Score
Looking at Figure 10a, a striking feature is that the relative bias in the winter season is drastically reduced from 74.7% for 3B42 to 19.1% for IMERG without RQI filtering, and the bias was reduced from 54.1% to 12.8% when RQI=100 was applied. Another salient distinction between TRMM 3B42 and GPM IMERG was that there was a large increase in POD, and a marginal increase in FAR; as a result, a perceived increase in HSS. As discussed in Su et al. [14], the improved POD was mainly attributed to the enhanced For winter, missed precipitation is the major source of error, accounting for about 50% of the total precipitation for 3B42 and 20-30% for IMERG, much larger than the total bias. The huge missed-precipitation was countered by the sum of positive hit bias and positive false bias, resulting in a very small total bias.
For summer, both missed and false precipitation biases were smaller compared to those in winter. Compared to winter, the positive hit bias in summer was greatly reduced for 3B42, but it turned out to be negative for IMERG.
One easily noticeable feature in Figure 9a was that the missed-and false-precipitation biases for both IMERG and 3B42 obviously decreased with the increasing RQI in winter season. In summer, this feature was not very salient. RQI filtered less area in summer than in winter because RQI was better in the warm season than in the cold season ( Figure 2) due to the presence of stronger low-level temperature gradients during winter periods. Figure 10 shows statistical metrics for IMERG and 3B42 performances relative to MRMS with RQI filtering (zero or 100%) for winter and summer, respectively. IMERG clearly outperformed 3B42 in both winter and summer in terms of all metrics except for a slightly worse false rate, FAR, and relative bias in summer. The morphing algorithm improved the precipitation detection with better hit and miss rates, and increased POD and HSS scores in Figure 10a-d, but at the price of slightly increased false rate and FAR. The increased FAR might be an issue associated with PMW retrieval, which sometimes may erroneously treat the dynamic surface characteristics over land as precipitation signals [43].

Probability Distribution Analysis
To further investigate the distribution of precipitation intensities, probability distribution functions are constructed as shown in Figure 11 for 3-hourly precipitation occurrences and volumes at 0.25° resolution without RQI filtering over the study domain during the 55-month period. The insets were for the data in the range between 0 and 0.1 mm h −1 , representing either no precipitation or drizzle. In this range, 3B42 accounted for 96.1% of total data points but only 0.3% of precipitation volume, whereas the numbers Looking at Figure 10a, a striking feature is that the relative bias in the winter season is drastically reduced from 74.7% for 3B42 to 19.1% for IMERG without RQI filtering, and the bias was reduced from 54.1% to 12.8% when RQI=100 was applied. Another salient distinction between TRMM 3B42 and GPM IMERG was that there was a large increase in POD, and a marginal increase in FAR; as a result, a perceived increase in HSS. As discussed in Su et al. [14], the improved POD was mainly attributed to the enhanced sensitivity of sensors used in producing the precipitation products and increased sampling frequency of the GPM mission.
Considered over a continental scale, Figure 10c,d indicate that IMERG precipitation estimates were lower relative to MRMS during summer. A similar result is also found by comparing IMERG with radar-based Stage IV product [44]. This suggests that IMERG may underestimate precipitation from convective systems. In summer, strong convective precipitation events occur more frequently. Due to their short duration, localization, and spatial complexity, these events are generally more difficult to be captured by satellite remote sensors.
The hit bias in Figure 9 and relative bias in Figure 10 had the same direction (positive or negative) but different amplitudes. This behavior occurs because the relative bias represents a difference between satellite and MRMS precipitation as the percentage of MRMS precipitation, conditioned on the presence of a satellite and MRMS precipitation rate of at least 0.1 mm h −1 , whereas the hit bias is the difference as the percentage of MRMS precipitation greater than 0.1 mm h −1 .
To further measure the agreement between satellite-based and ground-based products, the Kling-Gupta efficiency (KGE) [45,46], a recently more widely used criterion for the predictive skill assessment of hydrological models, was also introduced as another statistic in this study. By taking into account correlation, bias, and variability in a more balanced way, KGE provides an additional interesting insight into the performance of satellite-based products. KGE ranges from −∞ (worst) to 1 (best). Table 1 lists the KGE values calculated from 3-hourly precipitation rates at 0.25 • resolution for 3B42 and IMERG using MRMS as a reference during the periods of the three winters, five summers and 55 months, respectively. Table 1 further demonstrates the obvious performance improvement from 3B42 to IMERG in winter. A threshold of 0.1 mm h −1 was used in the calculation of these statistical metrics. If other thresholds were used, the numerical values for these statistics would be different. However, as indicated in Figures 5 and 6, the general conclusion should be robust.

Probability Distribution Analysis
To further investigate the distribution of precipitation intensities, probability distribution functions are constructed as shown in Figure 11 for 3-hourly precipitation occurrences and volumes at 0.25 • resolution without RQI filtering over the study domain during the 55-month period. The insets were for the data in the range between 0 and 0.1 mm h −1 , representing either no precipitation or drizzle. In this range, 3B42 accounted for 96.1% of total data points but only 0.3% of precipitation volume, whereas the numbers were 93.0% and 3.1% for IMERG, 93.5% and 1.5% for MRMS. This also indicates the improved capability of IMERG in detecting no precipitation or drizzle occurrences compared to 3B42, which could be attributed to the GPM DPR with its higher sensitivity relative to TRMM PR [47,48]. For precipitation under 0.1 mm h −1 , IMERG very well reproduced the precipitation occurrences but overestimated the volumes. precipitation less than 1 mm h −1 . IMERG shows an improved agreement with MRMS both in occurrence and volume distributions.
Similar features can be also seen in the joint distributions of 3-hourly precipitation rates at 0.25° resolution for 3B42 vs. MRMS, and IMERG vs. MRMS, respectively, conditional on both precipitation rates at least 0.1 mm h −1 (Figure 12). More importantly, the random error for IMERG was very low compared to 3B42, as seen by the tight scatter in the density plot. The distribution was more concentrated along 1:1 line for IMERG, consistent with NMSE and NMAE results in Figure 10. Figure 12 demonstrates that both higher and lower precipitation rates in IMERG were better represented than in 3B42, though IMERG slightly overestimated the higher precipitation rates and underestimated the lower precipitation rates. The improvement can be mainly attributed to the improved morphing scheme and enhanced sensitivity of sensors used in IMERG.  The distribution for the precipitation above 0.1 mm h −1 was separately constructed for each product. 3B42 product shows an obvious overestimation in occurrences for the precipitation with the intensity between 1 and 30 mm h −1 , and underestimation for precipitation less than 1 mm h −1 . IMERG shows an improved agreement with MRMS both in occurrence and volume distributions.
Similar features can be also seen in the joint distributions of 3-hourly precipitation rates at 0.25 • resolution for 3B42 vs. MRMS, and IMERG vs. MRMS, respectively, conditional on both precipitation rates at least 0.1 mm h −1 (Figure 12). More importantly, the random error for IMERG was very low compared to 3B42, as seen by the tight scatter in the density plot. The distribution was more concentrated along 1:1 line for IMERG, consistent with NMSE and NMAE results in Figure 10. Figure 12 demonstrates that both higher and lower precipitation rates in IMERG were better represented than in 3B42, though IMERG slightly overestimated the higher precipitation rates and underestimated the lower precipitation rates. The improvement can be mainly attributed to the improved morphing scheme and enhanced sensitivity of sensors used in IMERG.

Discussion
From the launch of the TRMM satellite in November 1997 to the currently active GPM mission, satellite-based remotely sensed precipitation measurement has been advanced steadily. Built upon the success of TRMM, GPM was developed to provide not only measurement continuity, but also an improvement on the TRMM instruments, algorithms, and precipitation products. GPM IMERG, as a successor of TRMM 3B42, is a unified satellite algorithm developed to provide multisatellite precipitation products over

Discussion
From the launch of the TRMM satellite in November 1997 to the currently active GPM mission, satellite-based remotely sensed precipitation measurement has been advanced steadily. Built upon the success of TRMM, GPM was developed to provide not only measurement continuity, but also an improvement on the TRMM instruments, algorithms, and precipitation products. GPM IMERG, as a successor of TRMM 3B42, is a unified satellite algorithm developed to provide multisatellite precipitation products over the globe. All IMERG products were retrospectively processed back to the start of TRMM era (January 1998), and continued for the entire life of the GPM mission, currently expected to last to the mid-2030 s or beyond. The evaluation of a new version of IMERG is particularly important for the algorithm development and application.
It should be noted that the evaluation of GPM IMERG and TRMM 3B42 precipitation products in this study was carried out over the land area only (Figure 1). The results drawn from this land area would be different for oceanic sites because of challenging problem for the PMW retrievals over complex land surfaces. A validation study over the ocean (e.g., Kwajalein Atoll) is underway.
In addition, any ground-based products themselves are certainly not perfect, such as MRMS, especially in winter and over mountainous regions, although they have been often used as references in the satellite-based product validation. The validation results are likely unreliable if questionable products are used as references [49]. Satellite-based products in western CONUS appear to exhibit reduced performance in comparison to the eastern CONUS where RQI is high. This does not necessarily mean that the satellite products are the only problem, as there are also problems in the MRMS product as indicated in Figure 2. It is our conclusion that the MRMS product herein might not be suitable as a GV reference over the mountainous west CONUS regions. On the contrary, one may reasonably argue that the satellite products can be utilized to fill the gaps or even substitute the ground product where the ground observations are sparse, rare, poor, or completely missing, as considering that the satellite products perform well over many areas with adequate radar or gauge coverage.

Conclusions
This study evaluated two satellite-based precipitation products TRMM 3B42 V7 and GPM IMERG V06B using MRMS as the reference over CONUS for the 55-month period. GPCC was also used in the comparison with MRMS and satellite products. The spatial distributions of precipitation climatology from four products were analyzed. The error characteristics were further examined for 3B42 and IMERG in winter and summer by an error decomposition approach, which partitioned total bias into hit bias, biases due to missed precipitation, and those due to false precipitation. The continuous and categorical statistical metrics were used to quantitatively evaluate the performance of the two satellitebased products. The main findings are summarized as follows: (1) All products display a high degree of consistency in spatial distribution patterns though some differences are visually discernable.
(2) The IMERG shows substantial improvements in terms of nearly all statistical metrics, compared to its predecessor 3B42. (3) For winter, the improvement in IMERG was primarily from significantly reduced missed-precipitation bias, and from largely reduced positive hit bias. For summer, the improvement was mainly from notably reduced missed-precipitation bias and marginally reduced false-precipitation bias but at the expense of worse hit bias. (4) The precipitation intensity distribution shows a significant improvement of IMERG algorithm in comparison with 3B42, which obviously overestimated heavy precipitation but underestimated light precipitation. (5) Missed-precipitation bias over mountainous regions, especially over frozen surfaces in winter, is still a challenging problem in satellite-based precipitation retrieval algo-rithms. The bias correction is of particular importance in mountainous regions such as Serra Nevada Mountains in California and Rocky Mountains in Colorado. (6) All the statistical metrics and the error decomposition approach work together were effective in evaluation of the performances for the satellite-based precipitation products.