Comparison of the GPM IMERG Final Precipitation Product to RADOLAN Weather Radar Data over the Topographically and Climatically Diverse Germany

Precipitation measurements provide crucial information for hydrometeorological applications. In regions where typical precipitation measurement gauges are sparse, gridded products aim to provide alternative data sources. This study examines the performance of NASA’s Integrated Multi-satellitE Retrievals for the Global Precipitation Measurement Mission (IMERG, GPM) satellite precipitation dataset in capturing the spatio-temporal variability of weather events compared to the German weather radar dataset RADOLAN RW. Besides quantity, also timing of rainfall is of very high importance when modeling or monitoring the hydrologic cycle. Therefore, detection metrics are evaluated along with standard statistical measures to test both datasets. Using indices like “probability of detection” allows a binary evaluation showing the basic categorical accordance of the radar and satellite data. Furthermore, a pixel-by-pixel comparison is performed to assess the ability to represent the spatial variability of rainfall and precipitation quantity. All calculations are additionally carried out for seasonal subsets of the data to assess potentially different behavior due to differences in precipitation schemes. The results indicate significant differences between the datasets. Overall, GPM IMERG overestimates the quantity of precipitation compared to RADOLAN, especially in the winter season. Moreover, shortcomings in detection performance arise in this season with significant erroneously-detected, yet also missed precipitation events compared to the weather radar data. Additionally, along secondary mountain ranges and the Alps, topographically-induced precipitation is not represented in GPM data, which generally shows a lack of spatial variability in rainfall and snowfall estimates due to lower resolution.


Introduction
Precipitation is of paramount importance as a driver of the global water and energy cycle and interactions between the bio-, hydro-, and atmosphere and thus has been declared as an Essential Climate Variable (ECV) [1].Information on the spatial and temporal distribution of this crucial variable helps in understanding its vast impact on numerous environmental aspects of life on Earth.Water resource management, predicting and monitoring agricultural yields, or disaster prevention and ultimately management are exemplary fields that strongly depend on accurate precipitation measurements.Traditional measurement gauges are sparse in many parts of the world [2], which hindered the deduction of meaningful precipitation estimates for these regions until a few decades ago, when gridded (satellite) products came to close these gaps.Currently, a physically-measured precipitation distribution can be acquired via interpolation of gauge measurements, weather radar estimates, or satellite observation.At the global scale, the spatial variability of rain and snowfall can be best represented with remote sensing imagery, as radar and gauge measurement stations are not available world-wide with sufficient density and coverage.Moreover, time-series of satellite data let global precipitation patterns and distribution become apparent.Still, region-specific differences in climate and topography are determinant factors for uncertainties in the performance of satellite precipitation products.Currently, developments to improve gridded precipitation data utilize creation or correction approaches for satellite-based precipitation products (SPP) from satellite soil moisture retrieval data [3][4][5][6][7][8] or combine datasets from various sources like gauge measurements, atmospheric models, and satellite observations [9].
NASA's Global Precipitation Measurement (GPM) mission launched the GPM Core Observatory (CO) as the successor of the well-renowned Tropical Rainfall Measuring Mission (TRMM) spacecraft in 2014 [10].Additional channels on both the Dual-frequency Precipitation Radar (DPR) and on the GPM Microwave Imager (GMI) make it an advanced replacement of the older satellite.The Integrated Multi-Satellite Retrievals for GPM (IMERG) gridded dataset used in this study is a Level 3 NASA product which, unifies and inter-calibrates data of about 10 constellation satellites from several space agencies based on the GPM CO [11][12][13].
Numerous comparison studies involving GPM data have been carried out over different spatial domains, e.g., global [14], Canada [15], Singapore [16], Malaysia [17], China [18][19][20], India [21], Iran [22], and Saudi-Arabia [23]; yet, investigations covering European countries are sparse, and no detailed comparison over Germany exists until today.However, the consistent availability at high temporal and spatial resolution and hence lowered uncertainty propagation in the results of hydrological modeling make GPM a viable data source for applications across European catchments of different scales [24].Nevertheless, systematic bias and random errors are usually contained in satellite precipitation estimates [25,26].Mei et al. [27] showed that SPPs furthermore are prone to underestimation of extreme events and hence are the main contributor to the total error in their hydrological modeling setup.Although GPM data are currently barely used in hydrology-related modeling scenarios in Europe, numerous future applications have been proposed.The topics cover, e.g., landslide threshold precipitation in the Italian Umbria region [28], debris flow-triggering rainfall [29], or modeling of flood events in alpine terrain [30].Moreover, GPM data are now incorporated in the Global Flood Detection System (GFDS [31]) [32].The insufficient performance of this dataset over Germany, which has been demonstrated in a validation study in the TRMM era [33], generates uncertainty for future usage.Hence, a performance test of GPM over Germany is necessary, to allow questioning these kinds of results over this or similar geographic regions.
Furthermore, the existing comparison setups include different datasets.Speirs et al. [34] for example compared GPM DPR to the MeteoSwiss radar network with a focus on mountainous regions.The radar data are adjusted, yet only to a very limited number of gauges (6-10, 33) and not on an operational basis, but to long-term mean precipitation values.Other studies also evaluated GPM (and mostly the DPR product) against weather radar datasets [35][36][37] where many focused on performance towards snow detection [38][39][40][41][42][43].The resulting findings indicate huge improvements compared to the TRMM era.Yet, the need for future improvements of the algorithm to further enhance the IMERG abilities in freezing conditions still persists [22,34,40,44,45].
Studies on the performance of SPPs are strongly location dependent with highly diverse correlation values to gauge measurements especially in challenging topography [17,46].Therefore, the evaluation of quantitative precipitation estimates (QPE) is vital before operationally applying them in a specific study site.Germany, in additional to its diverse topography, lies in the transition zone from oceanic to continental climate with different apparent precipitation schemes, making it a very interesting and challenging case study.
The novelty in the presented case is the comparison of the final GPM IMERG data to a temporal and spatial high resolution precipitation product.This product is the state-of-the-art weather radar-derived and operationally gauge-adjusted precipitation product RADOLAN RW from the German Weather Service (DWD, Deutscher Wetter Dienst).Due to the high sampling frequency, short-scale precipitation events can be captured.Furthermore, the hourly online adjustment routine makes it a balanced dataset, adhering to a high degree to the gauge measurements without cutting out extreme events [47,48].
To assess the performance of GPM over complex terrain, throughout seasons and consequently on different precipitation regimes, the study aims to compare final GPM IMERG against RADOLAN RW data from DWD.Therefore, different standard statistical measures, as well as a range of categorical indices are applied and evaluated on a pixel-by-pixel basis.Utilizing this form of spatial comparison accounts for the drastic topographic differences throughout the study area with landscapes including lowlands, secondary mountain ranges to alpine peaks with heights up to 3000 m.a.s.l., as well as for the different seasons and precipitation regimes.Thus, two hypotheses will be addressed throughout the study: (1) GPM shows similar detection performance over different topographic and climate zones compared to RADOLAN data; (2) GPM and RADOLAN show the same spatial and seasonal trends in precipitation.

Study Area
The spatial bounds for the dataset comparison are comprised of the state territory of Germany, which extends from 47°to 55°N and from 5°to 16°E, respectively, and covers an area of 357,021 km 2 .
The topography is diverse, with lowlands in the north, uplands and secondary mountain ranges in central region and the foothills of the Alps, and adjacent summits with their highest peak being Zugspitze (2962 m.a.s.l.) in the southern part of Germany.An overview of the study area is given in Figure 1.Accordingly, the relief variability increases towards the southern part, where strong gradients in temperature and precipitation are caused by steep slopes in the mountainous region over a very short horizontal distance.For example, Garmisch-Partenkirchen at the foot of Zugspitze is characterized by a mean temperature of 7.2 °C and annual precipitation of 1231 mm, whereas the summit weather station yields −3.7 °C and 1978 mm.Overall, a temperate seasonal climate prevails with mean temperatures ranging from −3.7 °C to 11.0 °C and a mean annual precipitation ranging from 483 mm to 2340 mm.
The distribution of precipitation in Germany is induced by the spatial position of the state lying in between the oceanic Western Europe and the continental Eastern Europe.Amounts of precipitation, mostly brought by humid westerly winds, decrease towards the eastern parts of the study area, yet regions in the extreme south and parts of the uplands in central Germany show higher precipitation amounts due to their mountainous climate.In the winter time, solid precipitation in the form of snow is more common in areas with continental influence.
The time period from 1 December 2014 to 30 November 2017 is analyzed in this study.

Weather Radar Data
The gauge-adjusted quality-controlled RADOLAN RW (Radar Online Adjustment) dataset from the German Weather Service (DWD, Deutscher Wetter Dienst) is considered ground truth for the upcoming analyses.It is already widely used, e.g., for training and validation purposes in the machine learning domain [49,50], analyzing extreme flash floods [51], as well as enhancing the respective forecasts [52] and estimating the spatio-temporal variability of soil erosion [53].
The radar dataset is currently derived from 18 C-band weather radars operating on scanning intervals of 5 minutes.All but the radar station "Hohenpeißenberg", which is used for quality control, contribute to the quantitative precipitation analysis.The observational network's spatial distribution is shown in Figure 1 along with the associated coverage of each device with a radius of 150 km.Significant overlap within the dense radar network ensures accurate retrievals, since problems from dampening in the signal with increased distance from the sensor and hence missing or misinterpreting precipitation events are minimized [54].In the last few years, the weather radars have been gradually updated to dual-polarized scanning devices that allow discriminating the sort of hydrometeors [54].Within the specific calibration procedure, rain intensity-adapted Z-R relationships (empirical formula to estimate rainfall rates from reflectivity signal strength) and statistical clutter filtering are applied, and orographic shadowing effects are considered [48,55,56].Assumptions on the drop size distribution and droplet count are necessary for the deduction of precipitation [54].For RADOLAN, an extended Z-R relationship is utilized, as opposed to solely using standardized values from the literature.The relationship takes the absolute reflectivity, as well as horizontal gradients into account to distinguish typical convective and stratiform droplet distributions [48].Furthermore, potential overshooting effects in wintertime due to lower cloud heights are considered with a seasonally-dependent correction via a regression analysis.However, a general linear correction scheme does not fulfill the requirements of DWD due to erroneous adaptation of single extreme events, e.g., intensive convective cells that occur regularly throughout Germany in the summer.Therefore, a multiple polynomial regression is calculated to generate the correction factors for every pixel.This accounts for the respective scanning height class, day of year, and reflectivity [54].The enhancements concerning dual-polarization radar relevant Z-R relationships were not integrated in the online adjustment routine at the time of data acquisition.
Nevertheless, for a realistic estimation of the quantity of precipitation, measurements of approximately 1300 conventional stations are used for the operational hourly gauge adjustment routine [55].These sensors (Ott PLUVIO) basically work according to "Hellmann" ombrometers [57], which obey the standards of the World Meteorological Organization [58].The appliance of a weighing principle and surrounding temperature-dependent heating sets the PLUVIO apart from conventional measurement systems and allows capturing solid and fluid precipitation alike [48].A subset of the gauge stations is used in the generation of the monthly Global Precipitation Climatology Centre (GPCC) product.
The precipitation product is available at a temporal, spatial, and intensity resolution of 1 h, 1 km, and 0.1 mm.A dimension of 900 × 900 pixels allows the polar-stereographic composite grid with the center point at 9.0°E 51.0°N to cover the whole state territory of Germany [47,48].Throughout this study, the dataset will be referred to as "RADOLAN".

Satellite Data
The GPM IMERG Version 5 final precipitation half hourly dataset with 0.1-degree spatial resolution is compared to the aforementioned radar precipitation dataset.The GPM Core Satellite is equipped with a multi-channel, dual-polarization Passive Microwave sensor (PMW) and an active scanning radar.Improvements to the predecessor TRMM satellite include increased orbital inclination from 35°to 65°for improved coverage, upgraded radar to two frequencies, as well as additional "high-frequency" channels in the PMW, both allowing for and facilitating the detection of light and solid precipitation, respectively [12,13].In Version 5, the research-level "final" dataset is adjusted monthly to the extensive GPCC gauge-based dataset, which is available at 1.0°× 1.0°spatial resolution [59].In the study, the dataset will be addressed as "GPM".

Preprocessing of Datasets
In order to make the datasets spatially and temporally comparable, the RADOLAN dataset was reprojected from the DWD-specific stereographic projection to WGS84, remapped, and aggregated to the GPM grid.Remapping routines using bilinear interpolation or high-order finite-differencing techniques may lead to unexpected behavior, e.g., higher local maxima, and are non-conservative; hence, they behave inconsistent with regard to precipitation sums in the original and regridded dataset [60,61].Furthermore, bilinear remapping schemes produce significant changes especially to categorical skill scores [62].Therefore, the ideal regridding scheme to use for precipitation data, being discontinuous over space and time, is the area conservative regridding, which calculates fractional contributions of grid cells from the original data and hence maintains the same area-averaged rainfall before and after the remapping [63].Thus, the specifically-applied spatial averaging procedure to remap the finer RADOLAN grid data to the coarser GPM grid utilizes the first order conservative remapping scheme from Jones [64], comprised in the Climate Data Operators software (CDO), which applies the SCRIP algorithm (Spherical Coordinate Remapping and Interpolation Package) [65,66].This technique is widely applied in other studies dealing with precipitation data [67][68][69].The area-averaged precipitation quantity Fk at the destination grid is calculated as follows: where A k denotes the area of the destination grid cell k and f is the precipitation quantity in the original grid, which has an overlapping area with the destination grid [64].Furthermore, the GPM data were aggregated temporally to match RADOLAN's hourly resolution.Both datasets were clipped to the extent of the state territory of Germany.

Methodology
The GPM satellite precipitation dataset was statistically compared to RADOLAN weather radar data.Generally, in investigations like this, quality checks of the involved data are critical to produce meaningful results in the end.In this study, 55 weather radar hourly grids are reported as missing, representing solely 0.17% of the considered time steps.The GPM time series is complete.Furthermore, visual interpretation of the radar images for the time span under review indicates no erroneous data concerning typical radar-related errors like beam blockage and artifacts, which occurred in the first versions of the distributed RADOLAN data at the beginning of the recording period.
To determine whether the datasets show seasonally-dependent dissimilar behavior, due to different precipitation schemes and the higher prevalence of snowfall in winter, the statistical analysis was split into the four meteorological seasons winter (DJF), spring (MAM), summer (JJA), and fall (SON).Overall, statistical comparisons of precipitation sums and means have been carried out.Pixel-by-pixel difference and correlation analyses were conducted additionally to provide a spatial representation of the level of compliance of the RADOLAN and GPM datasets.Pearson's r was used as the correlation measure.r = cov(P GPM , P RADOLAN ) Furthermore, the overall unconditional bias B was calculated for the data with the following formula.
A perfect linearity of precipitation measurement amounts in GPM and RADOLAN results in a value of 1.
To represent the average magnitude of the error, the Mean Absolute Error (MAE) is used: The Root Mean Squared Error (RMSE) with greater weight for larger errors than the aforementioned MAE is also part of the statistical evaluation: where P GPM and P RADOLAN are the satellite and weather radar precipitation estimates, respectively, i denotes the ith hourly event in the case of the pixel-by-pixel calculation, and the ith element (all pixels over all time steps) for the overall calculation.In the same way, N stands for observed hourly values per pixel or the product of the count of pixels and the count of hourly values, respectively.Furthermore, the ability to ascertain wet days with precipitation amounts greater than 1 mm was examined to allow for inferences to be made about the detection rates of the two precipitation datasets.Therefore, the count of these days and the respective mean precipitation sum have been evaluated for the datasets on a seasonal basis.
Additionally, categorical indices are calculated to further the knowledge about detection performance.They allow the evaluation of the binary accordance of the precipitation datasets, meaning to see if events are captured uniformly in both datasets.This has been done for the spatially-aggregated datasets, as well as on a pixel-by-pixel basis.For these calculations, the contingency grid shown in Table 1 is used, where a, b, c, and d represent the total count of data pairs matching the requested criteria.RADOLAN is chosen as reference due to the originally higher spatial resolution and the higher temporal frequency in adjusting to gauge measurements.For further information on the metrics used, please refer to, e.g., Woodcock [70], Doswell et al. [71], Schaefer [72].The Probability Of Detection (POD) for GPM measurements over Germany in the reported time period can be written as: and gives a measure of how effective the satellite observations detect a rain event compared to RADOLAN with the perfect score being 1.
The opposite case, where precipitation is erroneously indicated by GPM, is assessed with the False Alarm Ratio (FAR): where the perfect score is 0. The Frequency Bias Index (FBI) is the ratio of the total count of precipitation events of the two datasets.The values range from 0 to ∞, with a perfect score of 1: This complements the similar measure of the unconditional bias in that the amounts of precipitation are left out and only temporal and spatial similarities in the occurrence of such events are taken into consideration.
The Critical Success Index (CSI) combines the information of FAR and POD.Thus, it shows how well the correctly-detected precipitation events from GPM conform to all the recorded precipitation events, making the CSI a very balanced measure, with the best score being 1: Finally, the Heidke Skill Score (HSS) was calculated for the datasets.This metric answers the question on accuracy against random guessing.For a perfect measurement, the value will be 1.Performance equal to or worse than random guessing results in −1 ≤ HSS ≤ 0: A threshold of 0.1 mm/h is defined to delineate a precipitation event for the calculation of the above indices.This is in agreement with both datasets' intensity resolution.Hourly pixel values below this threshold are treated as noise and therefore are omitted.

Overall
Figure 2a,b shows the yearly mean precipitation of the two datasets.The plots serve clearly as evidence for the different recording techniques and their initially different spatial properties.The topographic characteristics of Germany can be traced from the RADOLAN data, which, although spatially aggregated, reveal the inherited higher spatial variability.In contrast, the yearly mean precipitation measured by the GPM constellation appears smoother.The overall pattern indicates a similar precipitation distribution across Germany with high divergence in the level of detail.Both datasets agreed on the foothills of the Alps as the rain-laden region and eastern Germany as the driest sub-region in the state territory.The difference of GPM's and RADOLAN's precipitation amounts over the whole period under review again demonstrates the differences in spatial variability of the datasets.Furthermore, GPM in many parts of Germany overestimated the quantity of precipitation.Yet, over areas of secondary mountain ranges and alpine regions, the satellite data indicated lower precipitation amounts than the gauge-adjusted weather radar (Figure 2c).The monthly precipitation sums averaged over entire Germany show a clear pattern (Figure 3).Across all winter months in the reporting period, GPM's QPE clearly exceeded those of RADOLAN with a maximum monthly mean surplus per pixel of >20 mm.In summer months, the collected data coincided.The evaluation of the unconditional bias upholds previous findings by also indicating a general overestimation of the precipitation amount by the GPM data compared to RADOLAN's QPE with B = 1.31.

Seasonal Analysis
The analysis of seasonal aggregated data was used to further reveal differences in precipitation patterns and the respective detection by the GPM and RADOLAN datasets.The absolute differences per season over the whole reporting period are shown in Figure 4.Besides the again prominent existence of differences due to spatial variability, the differences are diverse across seasons and conform to Figure 3.In fall and winter months (SON, DJF; Figure 4), GPM data showed higher precipitation values than RADOLAN in most areas.In the other two seasons, the satellite QPE were generally more on par with the weather radar data.However, in the southern part of the study area, RADOLAN showed higher values in spring (MAM) and especially in the summer season (JJA).These findings are further supported by the mean precipitation sums per season across the territory of Germany, which are shown in Figure 5a.An overestimation of the precipitation amount by GPM data occurred in all seasons.However, wintertime with a surplus of 76% needs to be emphasized.
Pearson's R value was utilized to calculate the correlation between GPM and RADOLAN precipitation.Additionally, the measure was applied on a pixel-by-pixel basis to evaluate the GPM and RADOLAN data's spatial agreement.Therefore, for every location in every seasonal data subset, the correlation was calculated.The overall correlation was 0.49, where for the single seasons, the values differed greatly, resulting in a value of 0.38 for DJF-, 0.55 for MAM-, 0.54 for JJA-, and 0.57 for SON-season.These results were backed by the spatial representations shown in Figure 6.All seasons besides DJF showed moderate correlation throughout the state territory of Germany.In the winter season, however, great shares of the southeastern parts of the study region showed very low correlation values around 0.1 to 0.2 with the minimum being 0.07.

Categorical Performance
The amount of wet days with a daily precipitation sum greater than 1 mm varied between the datasets (see Figure 5b).Besides the winter season, RADOLAN captured significantly more rain events than GPM.In the spring season, this accounted for up to 50,000 pixel hours within the reporting period.Yet, also in accordance with previous results, the mean precipitation amount per wet day measured by GPM was higher than the respective RADOLAN value in all seasons.Although GPM showed a lower detection rate for wet days, the surplus of precipitation amount compensated this effect, allowing the aforementioned results concerning the satellite measurements to be positively biased compared to RADOLAN to still be valid.
Diverse categorical indices have been calculated to obtain knowledge about the dataset-specific detection capabilities concerning precipitation events (see Section 3.2).These were again calculated for the whole datasets, as well as for seasonal subsets.Furthermore, a spatial representation calculated on a pixel basis may be found in Figures 7 and 8.
The capability of the GPM dataset to capture every precipitation event was moderate with an overall value of 0.53 (see Table 2).Regions with high relief energy showed the lowest POD values throughout all the seasons.The highest amounts of erroneously-detected precipitation events showed up in the eastern part of Germany, demarcated most clearly in the SON and DJF seasons.This demarcation is related directly to the FBI being strongly positive in that region in the same seasons.Still, more events per pixels across Germany were detected by RADOLAN in all seasons, resulting in values of FBI ranging from 0.68 to 0.90 with an overall value of 0.78.
A different temporal pattern can be found in the error indices MAE and RMSE.However, due to the aforementioned topography related concern, the spatial shortcomings of GPM versus RADOLAN in representing precipitation still persisted.Besides the winter season, also in the summer, high error values throughout most of Germany were present.Nevertheless, alpine regions have to be highlighted as specific region, as the error values clearly exceeded the error values from the rest of Germany.

Discussion
The single most marked observation to emerge from the data comparison is the strong discrepancy of the GPM and RADOLAN dataset concerning precipitation estimation for the winter season.Correlation between the satellite observation and weather radar data is low for this time period and seems to show an inversely proportional relation to continentality.Combined with low POD values, uncertainty arises with respect to the applicability of the dataset in, e.g., hydrological modeling.The problems of GPM dealing with solid precipitation have to be considered as one rationale behind the low detection rate, yet highly overestimated precipitation amounts in the winter season compared to the weather radar.GPM IMERG data being positively and negatively biased in cold environments is consistent with previous findings in the literature reported from [22,36], respectively.He et al. [18] even excluded winter months from their study as both satellite and gauge measurements are error prone in the detection of solid precipitation.Kochendorfer et al. [73] also stated that weighing precipitation gauges is highly error prone, especially when wind speeds exceed 5 ms −1 .In this case, less than 50% of the actual amount of solid precipitation may be collected.For the type of measurement gauges mainly used in Germany, Boudala et al. [74] reported an undercatch with a ratio of 0.57 for solid precipitation.Different filter algorithms are applied to the gauge measurements by the DWD.However, wind effects may still alter the measurements [48].
In the current study, GPM was positively biased compared with RADOLAN throughout all seasons.Biased precipitation estimation of the satellite dataset has been published by several authors [17,18,22,44], however, for both positive and negative directions.Furthermore, the already mentioned results of quantitative overestimation, particularly in winter and partly caused by false alarms, account for the shift in the precipitation amounts.The very high FAR and FBI values in eastern Germany in winter (see Figures 7 and 8), where lakes and big rivers (Elbe, Havel, Mulde) are abundant lead to an assumption of these landscapes and their inherent water cycle influencing the retrieval.Although, there is a high discrepancy in the number of events, there is no sign of excessive overestimation of the quantity of precipitation compared to the surrounding regions.Thus, e.g., ground fog, possibly not detected by RADOLAN though overrepresented in GPM, could be taken into consideration as an explanation for the disagreement of both datasets.Furthermore, solid precipitation in winter could be the reason for the discrepancies, although other areas throughout Germany are definitely more prone to snowfall.Moreover, the region is located in the lee of a secondary mountain range.Erroneously-detected precipitation in areas of rain shadow is reported for GPM estimates by Prakash et al. [21].The performance of GPM considering light rain and solid particle detection increased compared to its predecessor TRMM [11], yet the present case demonstrates like other studies that the need to further improve the algorithm still exists.
Furthermore, the detection of orographic precipitation is erroneous in GPM, which has already been covered by several studies [18,21,22].The inability to capture topography-induced convective precipitation clearly becomes evident in this study by most categorical indices and the overall difference image signifying these areas (see Figures 2c, 7 and 8).Therefore, existing high rainfall intensities along the Alps naturally lead to high error values in MAE and RMSE.The grainy nature of RMSE in the summer season JJA (see Figure 8) and high error values in southwest Germany can certainly be attributed to the nature of the metric itself and hence to the sensitivity towards high intensity precipitation events, which commonly occur in these regions throughout summer.Due to the RADOLAN's inherent shorter scanning interval and thus, after aggregation to hourly data, still existing higher probability to detect a high intensity rainfall happening on a short temporal scale, great discrepancies in the RMSE may arise from a missed precipitation event by GPM, particularly in the summer season.
General caution has to be applied when datasets with originally different spatial resolution are compared.Although the applied conservative remapping scheme as described in Section 3.1.3is widely appreciated as very suitable for regridding precipitation data, other techniques (e.g., bilinear, bicubic, iterative curvature-based interpolation) may slightly alter the findings of this study.However, the authors compared the results from the highly unequal non-conservative bilinear interpolation (data not shown) and the applied conservative interpolation scheme, finding that the changes in the results were very small and did not change the statement of the results.However, we recommend that future studies should consider an in depth analysis of the impact of the different interpolation schemes on the comparison of different precipitation datasets.Moreover, a transferability of the results can only be given to regions that share similar boundary conditions.Therefore, the case study over Germany is well suited, as it represents various topographical conditions, as well as several precipitation regimes to test the performance of GPM.
Further processing could include temporal aggregation to and comparison of daily values as precipitation data often are used on this temporal scale as input for other applications, e.g., in hydrologic modeling.It has to be noted that both institutes, NASA and DWD, provide additional products of the respective family (GPM and RADOLAN), which are calculated with a modified algorithm or are based on a subset of sensors.However, the specific purpose of this study was to compare the respective final community-ready precipitation datasets GPM IMERG v05 final and RADOLAN RW, which fully incorporate all data gathered for the respective mission.
Lastly, it has to be noted that the identified performance-related discrepancies profoundly become popular, as the two data sets cannot be considered entirely independent.GPM IMERG utilizes data from the GPCC network on a monthly basis for calibration.Parts of the involved gauges are also used in the hourly online adjustment routine of the RADOLAN dataset.This issue has been accepted by the authors as the calibration for both datasets takes place on a totally different temporal scale.

Conclusions
This study conducted a statistical comparison of two QPE products, namely the GPM IMERG half hourly Version 5 final satellite and RADOLAN RW weather radar dataset.Standard metrics like RMSE, MAE, and bias have been applied and categorical indices used to identify the strengths and shortcomings in the ability to detect single precipitation events.Additionally, a pixel-by-pixel analysis of these measures allows drawing conclusions on the spatial distribution of the inherent event identification capabilities of the GPM and RADOLAN datasets.
The results provide considerable insight into the different properties and indicate extensive discrepancies in some parts of the study.Four key findings are revealed by the analysis: (i) the GPM dataset shows low responsivity for the topographically-induced spatial variability of precipitation over Germany compared to the RADOLAN data (see Figure 2); (ii) the precipitation amounts measured by the satellite product exceed the weather radar data on a territory scale in all seasons, especially in winter (see Figure 3), whereas over spatial subsets with high relief energy, RADOLAN is on par or generates a surplus in precipitation quantity (see Figure 4 and MAE and RMSE in Figure 8); (iii) RADOLAN captures a higher amount of low intensity events (see the high FBI in Figure 8); and (iv) substantial differences in winter season have to be reported, in terms of low correlation (see Figure 6) and high FAR values, yet low POD and CSI/HSS success statistics (see Figure 7).These outcomes lead to the conclusion that caution and awareness of the peculiarities of the dataset have to be applied when using GPM data over Germany and thus also over parts of Europe.However, this protective measure extends to every dataset, which is attributed to being a reference or used in a similar manner.

Figure 1 .
Figure 1.Digital elevation model of Germany based on SRTM 1 arc second data (a) and the observational network of weather radar stations contributing to the RADOLAN dataset (b).

Figure 2 .
Figure 2. Yearly mean precipitation sum over Germany from RADOLAN (a) and GPM (b) data and the overall difference (GPM − RADOLAN) calculated for the period under review (c).

Figure 3 .
Figure 3. Spatially-averaged monthly precipitation sums in the GPM and RADOLAN datasets.

Figure 4 .
Figure 4. Differences in precipitation sums from GPM-RADOLAN datasets for seasons DJF, MAM, JJA, and SON over the reporting period.

Figure 5 .Figure 6 .
Figure 5. Mean of seasonal precipitation sums (a) and seasonal count and mean precipitation of "wet days" (b) of the GPM and RADOLAN datasets.

Figure 7 .
Figure 7. Categorical indices POD, FAR, CSI, and HSS for the total review period, DJF, MAM, JJA, and SON seasons.

Table 1 .
Contingency table for the calculation of categorical indices.