Extreme Events of Precipitation over Complex Terrain Derived from Satellite Data for Climate Applications: An Evaluation of the Southern Slopes of the Pyrenees

: Estimating extreme precipitation events over complex terrain is challenging but crucial for evaluating the performance of climate models for the present climate and expected changes of the climate in the future. New satellites operating in the microwave wavelengths have started to open new opportunities for performing such estimation at adequate temporal and spatial scales and within sensible error limits. This paper illustrates the feasibility and limits of estimating precipitation extremes from satellite data for climatological applications. Using a high-resolution gauge database as ground truth, it was found that global precipitation measurement (GPM) constellation data can provide valuable estimates of extreme precipitation over the southern slopes of the Pyrenees, a region comprising several climates and a very diverse terrain (a challenge for satellite precipitation algorithms). Validation using an object-based quality measure showed reasonable performance, suggesting that GPM estimates can be advantageous reference data for climate model evaluation. models, especially in areas poorly covered by rain gauge networks. The objective of the present study was to evaluate the performance of IMERG products in the representation of EPEs over an orographically complex region of the Iberian Peninsula. For this assessment, we used the R99p index, which identiﬁed 55 EPEs in the 2008–2018 period. Results were validated against a gridded precipitation dataset derived from a high-density rain gauge network.


Introduction
Extreme precipitation events (EPEs) have devastating effects on human society. They are responsible for a substantial number of natural disasters, including landslides and flash floods [1]. Consequently, accurate estimates of heavy rainfall are essential for the risk monitoring of these natural hazards. In this sense, in situ observation, satellite-based measurements, model outputs, and reanalysis datasets are very beneficial because they provide continuous spatial and temporal coverage of EPEs.
Gridded products derived from rain gauges are commonly used to evaluate EPEs because of the advantages of their direct measurement. However, products derived from point-scale gauge observations can greatly deviate from areal precipitation, especially when the network is too sparse [2]. This issue is crucial in complex areas where sharp gradients play a key role in the distribution of precipitation. Compared with rain gauge data, satellite-derived precipitation products (SPPs) provide global and consistent measurement of precipitation, filling the gaps of in situ observations [3,4]. One of the most recent SPPs is the integrated multi-satellite retrievals from global precipitation measurement (IMERG) [5]. The IMERG product results from the combination of retrievals from multiple satellites and has emerged as a powerful tool for natural hazard monitoring [6], including EPEs [7][8][9][10][11]. IMERG is not only used in hazard monitoring but also in climate research. For example, Zhou et al. [12] constructed a global database of EPEs that allows analyses of many event-based precipitation characteristics that conventional datasets are unable to support. Despite its benefits, SPPs are not immune to errors and suffer from several limitations, including sensor sensitivity, algorithm limitations, and merging errors, among others [13][14][15].
Precipitation estimates in the study area is a difficult task for both rain gauges and satellite products. On the one hand, the coverage of stations over mountains is usually low, with large areas with little or no information. Furthermore, the measurement of rainfall at high elevation is more susceptible to wind, evaporative loss, and siting [56,57]. On the other hand, satellite-based precipitation products, such as IMERG, are also prone to uncertainties. For example, rainfall intensity in warm orographic clouds cannot be accurately captured by infrared (IR) retrievals [58]. The Passive microwave (PMW) algorithms have also important limitations because of the emissivity of land surface. This bias is more pronounced during extreme events and over mountainous regions [59][60][61]. The study area embraces four climate zones (Table 1). Semiarid and Mediterranean regions cover almost 40% of the domain and include the dry lowlands of the Ebro basin (BSk) and territories near the Ebro river mouth (Csa and Csb). The temperate region (Cfa and Cfb) is an area of medium altitude that surrounds the mountain ranges of the domain. Cold climates (Dfb and Dfc) are restricted to areas of high altitude in the Pyrenees.  Figure 2 shows climate zones and boxplots of the monthly precipitation at eight selected locations for the 2008-2018 period. There is strong contrast observed between the dry southeast ( <50 mm/month), humid west (50-80 mm/month), and Pyrenees ( >80 mm/month). Outliers are mostly in spring and autumn, suggesting events of extreme precipitation. The study area embraces four climate zones (Table 1). Semiarid and Mediterranean regions cover almost 40% of the domain and include the dry lowlands of the Ebro basin (BSk) and territories near the Ebro river mouth (Csa and Csb). The temperate region (Cfa and Cfb) is an area of medium altitude that surrounds the mountain ranges of the domain. Cold climates (Dfb and Dfc) are restricted to areas of high altitude in the Pyrenees.  Figure 2 shows climate zones and boxplots of the monthly precipitation at eight selected locations for the 2008-2018 period. There is strong contrast observed between the dry southeast (<50 mm/month), humid west (50-80 mm/month), and Pyrenees (>80 mm/month). Outliers are mostly in spring and autumn, suggesting events of extreme precipitation.

Gauge Dataset and Quality Control
The Automatic Hydrological Information System of the Ebro Hydrographic Confederation (SAIH-Ebro) comprises 367 rain gauge stations covering the period 2008-2018 ( Figure 1; white dots). The use of this in situ observation network instead of the meteorological network of the Spanish Meteorological Agency (AEMET) is advantageous because the former is entirely independent of satellite and reanalysis products. Additionally, data from SAIH-Ebro have been used in previous research [62][63][64][65].
The original time series of observations were reconstructed using the R package reddPrec [66], following the methodology described in Serrano-Notivoli et al. [67]. Reconstruction was based on two steps: (1) Quality Control checks (QC) to detect and remove suspect data; and (2) estimation of daily precipitation for all observations with missing values on all days for the entire cleaned dataset, based on the nearest observations. The results of QC are available in the Supplementary Material (S2.1).

Grid Reconstruction
Products distributed on regular grids, whether satellite data, gauge-derived datasets, or reanalysis products, are currently used in most climatological studies. Some gridded datasets are derived from point-wise measurements, but others are directly computed at the cell level, such as model output. These products offer several advantages over point data: They are easy to compare with other products; they allow advanced spatial analysis; and they are useful for model validation, among other tasks.
Gridded products derived from rain gauges are powerful instruments to evaluate precipitation estimates but they are subjected to spatial interpolation to convert point-scale measurements into areal estimates [68]. Interpolation brings additional uncertainties and the quality of the interpolated precipitation at pixels without gauges is typically lower than that at pixels with gauges [69].

Gauge Dataset and Quality Control
The Automatic Hydrological Information System of the Ebro Hydrographic Confederation (SAIH-Ebro) comprises 367 rain gauge stations covering the period 2008-2018 ( Figure 1; white dots). The use of this in situ observation network instead of the meteorological network of the Spanish Meteorological Agency (AEMET) is advantageous because the former is entirely independent of satellite and reanalysis products. Additionally, data from SAIH-Ebro have been used in previous research [62][63][64][65].
The original time series of observations were reconstructed using the R package reddPrec [66], following the methodology described in Serrano-Notivoli et al. [67]. Reconstruction was based on two steps: (1) Quality Control checks (QC) to detect and remove suspect data; and (2) estimation of daily precipitation for all observations with missing values on all days for the entire cleaned dataset, based on the nearest observations. The results of QC are available in the Supplementary Material (S2.1).

Grid Reconstruction
Products distributed on regular grids, whether satellite data, gauge-derived datasets, or reanalysis products, are currently used in most climatological studies. Some gridded datasets are derived from point-wise measurements, but others are directly computed at the cell level, such as model output. These products offer several advantages over point data: They are easy to compare with other products; they allow advanced spatial analysis; and they are useful for model validation, among other tasks.
Gridded products derived from rain gauges are powerful instruments to evaluate precipitation estimates but they are subjected to spatial interpolation to convert point-scale measurements into areal estimates [68]. Interpolation brings additional uncertainties and the quality of the interpolated precipitation at pixels without gauges is typically lower than that at pixels with gauges [69]. Therefore, a dense network of rain gauge stations is critical to minimize uncertainties in a gridded product.
ReddPrec includes a function for grid reconstruction that creates generalized linear models to compute the reference values (RVs) using precipitation data (occurrence and magnitude) of the 10 nearest neighbors as the dependent variable and geographic information of each station (latitude, longitude, and altitude) as independent variables. ReddPrec uses a unique and independent model for each grid point and day [67]. The result was a high-resolution (0.1 • × 0.1 • ) gridded daily precipitation dataset from 2008 to 2018.
Two types of parameters were calculated to evaluate the performance of the interpolation method, i.e., the measurement of errors in the reconstructed data and a sensitivity test for intense precipitation. The interpolated data were validated against 34 rain gauge stations provided by AEMET (Figure 1; black dots). Therefore, a dense network of rain gauge stations is critical to minimize uncertainties in a gridded product.
ReddPrec includes a function for grid reconstruction that creates generalized linear models to compute the reference values (RVs) using precipitation data (occurrence and magnitude) of the 10 nearest neighbors as the dependent variable and geographic information of each station (latitude, longitude, and altitude) as independent variables. ReddPrec uses a unique and independent model for each grid point and day [67]. The result was a high-resolution (0.1° × 0.1°) gridded daily precipitation dataset from 2008 to 2018.
Two types of parameters were calculated to evaluate the performance of the interpolation method, i.e., the measurement of errors in the reconstructed data and a sensitivity test for intense precipitation. The interpolated data were validated against 34 rain gauge stations provided by AEMET (Figure 1; black dots). Figure 3 (top) shows the mean absolute error (MAE) at the precipitation gauges and their corresponding grid cells. The boxplots show good results at the monthly scale. The median (red line) in the MAE for all cases (top center) is < 0.3 mm/day and slightly increases on days with precipitation (top right). Another important test is the ability of the product to capture rainfall intensity. This was evaluated by the R10mm (R20mm) index, which represents the number of days in a month for which precipitation exceeded 10 mm (20 mm). Agreement between the interpolated field and AEMET is clear in both cases, although there is slight underestimation for gridded data (

Satellite Datasets
IMERG is a precipitation dataset with a 30-min temporal resolution and 0.1° × 0.1° spatial resolution [5]. IMERG combines retrievals from passive microwave observations with infrared precipitation estimates to produce a quasi-global homogeneous precipitation product based only on satellite data. IMERG also includes retrievals from the GPM Core Observatory, allowing the detection of light rain and falling snow [6].
The latest version (IMERG v06b) includes several important updates. The biggest change is the switch from geosynchronous IR to numerical model data from MERRA-2 and GEOS FP for the derivation of motion vectors used to propagate the PMW precipitation observations [70]. This change

Satellite Datasets
IMERG is a precipitation dataset with a 30-min temporal resolution and 0.1 • × 0.1 • spatial resolution [5]. IMERG combines retrievals from passive microwave observations with infrared precipitation estimates to produce a quasi-global homogeneous precipitation product based only on satellite data. IMERG also includes retrievals from the GPM Core Observatory, allowing the detection of light rain and falling snow [6].
The latest version (IMERG v06b) includes several important updates. The biggest change is the switch from geosynchronous IR to numerical model data from MERRA-2 and GEOS FP for the derivation of motion vectors used to propagate the PMW precipitation observations [70]. This change Remote Sens. 2020, 12, 2171 6 of 19 was motivated by two reasons: Discrepancies between cloud-top motion and precipitation motion and the extension of IMERG coverage to the poles. We used three IMERG v06b level-3 estimates of precipitation, the Early Run (IMERG-E), Late Run (IMERG-L), and Final Run (IMERG-F). Early and late products are conceptually the same but differ in their latency (4-h vs. 12-h lag) and how the IR data is folded into the microwave estimate. IMERG-E and IMERG-L are designed for near real-time applications, whereas IMERG-F is designed for research purposes. The final product (IMERG-F) incorporates monthly observed data from the Global Precipitation Climatology Centre (GPCC) dataset, which is derived from~6700 stations worldwide [71]. IMERG-F is available after a two-month delay.

Reanalysis Dataset
Reanalysis combines model data with ground and satellite observations to generate a globally complete and consistent dataset using the laws of physics. It produces data that go back several decades, providing an accurate description of the past climate. The European Centre for Medium-Range Weather Forecasts recently released a new generation of reanalysis products (ERA5 and ERA5-Land) that replaced previous products [72]. ERA5-Land (ERA5-L hereafter) [73] is a replicate of the land component of ERA5 reanalysis but with a series of improvements making it more accurate for land-surface applications, such as flood or drought analysis. In particular, ERA5-L runs at an enhanced spatial resolution (0.1 • vs. 0.3 • in ERA5) with a one-hour temporal resolution. Observations indirectly influence ERA5-L through the atmospheric forcing of ERA5. The precipitation output from ERA5-L is the sum of large-scale and convective precipitation.
Reanalysis datasets are not immune to errors and their performance depends on several factors: The quality and availability of observations, shortcomings in the assimilation methodology, and uncertainties derived from models [74,75]. These limitations became more relevant over mountains [76]. Table 2 summarizes the datasets used in our study.

Extreme Precipitation Events (EPEs)
An EPE was defined by the R99p index [77]. R99p is a widely used index created by the Expert Team on Climate Change and Detection Indices for identifying extremely wet days. An advantage of this metric is that the parameter depends on the distribution of precipitation at each location and thereby allows comparisons between different climate zones [78].
Cases were selected after reconstructing the 10-year time series of precipitation, which served as a base period for percentile calculation. The R99p is calculated based on daily precipitation on wet days (days with daily precipitation ≥ 1 mm). The R99p must be reached by at least 10 rain gauge stations in the domain. In total, 55 cases were selected. Table 3 shows the complete list of cases filtered by date, maximum rainfall, climate region, and coordinates. Most cases are in the northern portion of the domain, others along the Ebro valley, and a few in the south (see Supplementary Material S1 for maps).

Metrics
We assessed the performance of IMERG using both statistical and graphical methods (see Table 4 for details). The standard metrics used include MAE, relative bias (RB), probability density functions (PDFs), and the correlation coefficient (CC). MAE and RB highlight differences between datasets, whereas CC is a measure of the match between reference data and precipitation estimates. Table 4. List of metrics used to quantify the performance of rainfall products 1 .

Metrics Equation Perfect Score
Mean Absolute Error (MAE) The structure-amplitude-location method (SAL) [79] evaluates precipitation fields by identifying objects in both a predicted and observed field at a given time, and decomposing errors into three components. The power of SAL lies in its ability to discriminate between errors, i.e., structure (S), amplitude (A), and location (L). The structural component S considers the gradient around an object and its size. Component A considers domain-wide accumulation. The L component combines information about the distance between predicted and observed mass centers and the error of a weighted average distance between the precipitation object centers of mass. A perfect forecast is characterized by zero values for all three SAL components.

Standard Evaluation of Extreme Rainfall
Standard comparisons for the 55 cases were performed against the gauge-derived dataset. In general, reanalysis and IMERG-F showed good agreement with the rain gauge data (see S3.1 and S3.5 for a complete comparison). Conversely, the early and late versions of IMERG overestimated very intense precipitation rates (≥100 mm/day). This issue can be clearly observed in Figure 4  Another interesting disagreement is with light rain detection (fourth column). IMERG products had problems in detecting weak precipitation (≤1 mm) because of the sensitivity of the satellite sensors, but this issue is not crucial in the analysis of EPEs [80,81]. For reanalysis data, the sharp gradient between rain and no-rain areas is poorly represented (third and fourth columns). A reason that may explain this issue is that sharp gradients and discontinuities are always difficult to model in systems of partial differential equations [82].
A closer look at other metrics reveal new issues partially hidden in the previous selection. Figure 5 (left) compares the PDF for all cases. There is a problem common to the ERA5-L and IMERG products; both were unable to reproduce the peak of light rain observed by rain gauges. Two more problems were found in the reanalysis data, an overestimation of pixels with moderate rain rate (≥25 mm/day) and a rapid drop in the right tail of the distribution (also seen in the scatter plot). On the other hand, all IMERG versions overestimated very intense precipitation (≥75 mm/day). This problem is clear in the scatter plots Figure   Another interesting disagreement is with light rain detection (fourth column). IMERG products had problems in detecting weak precipitation (≤1 mm) because of the sensitivity of the satellite sensors, but this issue is not crucial in the analysis of EPEs [80,81]. For reanalysis data, the sharp gradient between rain and no-rain areas is poorly represented (third and fourth columns). A reason that may explain this issue is that sharp gradients and discontinuities are always difficult to model in systems of partial differential equations [82].
A closer look at other metrics reveal new issues partially hidden in the previous selection. Figure  5 (left) compares the PDF for all cases. There is a problem common to the ERA5-L and IMERG products; both were unable to reproduce the peak of light rain observed by rain gauges. Two more problems were found in the reanalysis data, an overestimation of pixels with moderate rain rate (≥25 mm/day) and a rapid drop in the right tail of the distribution (also seen in the scatter plot). On the other hand, all IMERG versions overestimated very intense precipitation (≥75 mm/day). This problem is clear in the scatter plots ( Figure 5 right), but the monthly surface gauge information helped to reduce bias in the final version of the product. ERA5-L and IMERG-F gave the best correlations (0.81 and 0.77, respectively).

Evaluation of Rainfall Structure and Intensity
A further step in evaluating the performance of IMERG was an analysis of the spatial structure of the precipitation. Standard grid point quality measurements are appropriate for the verification of fields with synoptic-scale structures but might be less useful for complex structures at scales < 100 km, such as precipitation. A classic example that illustrates the limitation of standard grid point comparisons is the "double penalty problem" [83,84]. This issue can be interpreted in terms of the categorial precipitation verification terminology: A forecast is penalized twice, for not getting the precipitation at the correct location (miss) and forecasting the precipitation at the wrong location (false alarm). Thus, a faithful representation of precipitation in terms of size and intensity may obtain poor scores for the standard metrics if the location is slightly incorrect. With the object-based approach, predicted and observed precipitation areas are reduced to regions of interest that can be compared to one another in a meaningful way [85]. We used the SAL method mentioned above. This method was originally developed to appraise the predictive skills of numerical weather prediction models (NWP) but can be applied to any gridded dataset. SAL provides a three-way assessment of a gridded rainfall field, structure, amplitude, and location. The S component compares the volumes of

Evaluation of Rainfall Structure and Intensity
A further step in evaluating the performance of IMERG was an analysis of the spatial structure of the precipitation. Standard grid point quality measurements are appropriate for the verification of fields with synoptic-scale structures but might be less useful for complex structures at scales < 100 km, such as precipitation. A classic example that illustrates the limitation of standard grid point comparisons is the "double penalty problem" [83,84]. This issue can be interpreted in terms of the categorial precipitation verification terminology: A forecast is penalized twice, for not getting the precipitation at the correct location (miss) and forecasting the precipitation at the wrong location (false alarm). Thus, a faithful representation of precipitation in terms of size and intensity may obtain poor scores for the standard metrics if the location is slightly incorrect. With the object-based approach, predicted and observed precipitation areas are reduced to regions of interest that can be compared to one another in a meaningful way [85]. We used the SAL method mentioned above. This method was originally developed to appraise the predictive skills of numerical weather prediction models (NWP) but can be applied to any gridded dataset. SAL provides a three-way assessment of a gridded rainfall field, structure, amplitude, and location. The S component compares the volumes of the normalized precipitation objects and provides information about their size and shape. The A component represents a normalized difference between the domain-average forecast and observation fields and is the only one that is independent of the identification of features, because it depends on the total precipitation amount. A positive (negative) value indicates overestimation (underestimation) of total precipitation. The L component combines information about the distance between predicted and observed mass centers and the error of a weighted average distance between the precipitation object centers of mass. A and S are in the range [−2,2], with a zero value corresponding to a perfect forecast. For location (L), zero indicates that the two fields' mass centers are identical. Figure 6 shows SAL diagrams for satellite products and reanalysis. The gray-shaded box represents the 25th and 75th percentiles of the A and S components. In general, ERA5-L has a smaller dispersion, with few outliers. However, reanalysis (S > 0.9) was poorer than IMERG-F (S~0.6) in terms of the size and shape of precipitation (vertical dotted line is the median of the distribution). This issue is better shown by the side-by-side histogram. Figure 7 compares histograms of IMERG-F and ERA5-L for the distribution of the S component of SAL. ERA5-L systematically predicted larger and flatter areas (S > 0.9), whereas IMERG-F gave a more realistic representation of the precipitation field structure.   Additionally, IMERG-L and ERA5-L slightly overestimated domain-average precipitation, whereas IMERG-E and IMERG-F obtained a nearly perfect score ( Figure 6, A-Component). The identification of the center of mass (L-Component) was similar between products, with ERA5-L being slightly better. In this case, data assimilation represents a clear advantage for the reanalysis dataset.
Rainfall intensity was also evaluated by comparing the pixels of the maximum rain rate. Figure  8 compares the relative bias of the ERA5-L and IMERG products for those pixels.  Figure 8 shows that ERA5-L systematically underestimated precipitation maxima (dark blue), whereas IMERG-F achieved better results (light blue/light red). IMERG-E and IMERG-L included the largest biases (positive and negative). Conversely, IMERG-E showed good performance for the location of precipitation maxima (16 cases), although ERA5-L was the best option (20 cases).
From Figure 8, we presume that reanalysis is unable to reproduce very high precipitation rates. This problem is also observed in the density plots of precipitation (Figures 5 and 9 and Supplementary Material (S4)). PDFs show that the top end of the distribution in ERA5-L is far from observations. This might be an important issue in the evaluation of EPEs. Two factors could explain this behavior: The coarser spatial resolution to model small-scale process [86] and the lower number of observations assimilated into the model [76]. In the first case, existing studies indicate that very high resolutions up to 4-6 km are necessary to accurately model snow and orographic precipitation Additionally, IMERG-L and ERA5-L slightly overestimated domain-average precipitation, whereas IMERG-E and IMERG-F obtained a nearly perfect score ( Figure 6, A-Component). The identification of the center of mass (L-Component) was similar between products, with ERA5-L being slightly better. In this case, data assimilation represents a clear advantage for the reanalysis dataset.
Rainfall intensity was also evaluated by comparing the pixels of the maximum rain rate. Figure 8 compares the relative bias of the ERA5-L and IMERG products for those pixels. Additionally, IMERG-L and ERA5-L slightly overestimated domain-average precipitation, whereas IMERG-E and IMERG-F obtained a nearly perfect score ( Figure 6, A-Component). The identification of the center of mass (L-Component) was similar between products, with ERA5-L being slightly better. In this case, data assimilation represents a clear advantage for the reanalysis dataset.
Rainfall intensity was also evaluated by comparing the pixels of the maximum rain rate. Figure  8 compares the relative bias of the ERA5-L and IMERG products for those pixels.  Figure 8 shows that ERA5-L systematically underestimated precipitation maxima (dark blue), whereas IMERG-F achieved better results (light blue/light red). IMERG-E and IMERG-L included the largest biases (positive and negative). Conversely, IMERG-E showed good performance for the location of precipitation maxima (16 cases), although ERA5-L was the best option (20 cases).
From Figure 8, we presume that reanalysis is unable to reproduce very high precipitation rates. This problem is also observed in the density plots of precipitation (Figures 5 and 9 and Supplementary Material (S4)). PDFs show that the top end of the distribution in ERA5-L is far from observations. This might be an important issue in the evaluation of EPEs. Two factors could explain this behavior: The coarser spatial resolution to model small-scale process [86] and the lower number of observations assimilated into the model [76]. In the first case, existing studies indicate that very high resolutions up to 4-6 km are necessary to accurately model snow and orographic precipitation  Figure 8 shows that ERA5-L systematically underestimated precipitation maxima (dark blue), whereas IMERG-F achieved better results (light blue/light red). IMERG-E and IMERG-L included the largest biases (positive and negative). Conversely, IMERG-E showed good performance for the location of precipitation maxima (16 cases), although ERA5-L was the best option (20 cases).
From Figure 8, we presume that reanalysis is unable to reproduce very high precipitation rates. This problem is also observed in the density plots of precipitation (Figures 5 and 9 and Supplementary  Material (S4)). PDFs show that the top end of the distribution in ERA5-L is far from observations. This might be an important issue in the evaluation of EPEs. Two factors could explain this behavior: The coarser spatial resolution to model small-scale process [86] and the lower number of observations assimilated into the model [76]. In the first case, existing studies indicate that very high resolutions up to 4-6 km are necessary to accurately model snow and orographic precipitation [87,88]. For the second case, ERA5-L runs without data assimilation and observations indirectly influence the simulation through atmospheric forcing. This issue may be crucial in complex terrains.

Seasonality and Climate Regions
Many areas of the domain are strongly influenced by the seasonal variation of precipitation [89]. Thus, for example, precipitation in the Ebro Basin falls mainly in spring and autumn, with summer and winter generally registering the minimum rainfall. Conversely, precipitation along the southern face of the Pyrenees is homogeneously distributed across all seasons (Figure 2).
The seasonal analysis of extreme precipitation is important not only in terms of occurrence but also for the synoptic conditions that produce them. Most of the extreme events in our study were recorded during the fall (22), followed by winter, spring, and summer (15, 11, and 7 cases, respectively). During spring, summer, and autumn, a 500-hPa trough and a strong insolation are usually responsible for extreme convective precipitation [55]. Conversely, in winter, northwest flow predominates, favoring persistent precipitation in the headwaters of the Pyrenees (see Supplementary Material S3.1 for an overview of precipitation in all cases). Figure 9 (left) compares the seasonal distribution of the PDFs between gauges, ERA5-L, and IMERG products (see S4 for a case-by-case comparison). In general, IMERG-F correctly reproduced the distribution of precipitation in all seasons, especially at the top end of the distribution, where reanalysis underestimated heavy rainfall rates and the early and late products overestimated them. Pixel-by-pixel comparison Figure 9 (right) shows that IMERG-E and IMERG-L yielded the poorest correlations. The bias correction in IMERG-F clearly improved the results of satellite data, obtaining similar (SON) and better (JJA) correlation coefficients relative to ERA5-L. However, reanalysis was still better in MAM and DJF, during which intense convective activity is less common.
Remote Sens. 2020, 12, x FOR PEER REVIEW 13 of 20 [87,88]. For the second case, ERA5-L runs without data assimilation and observations indirectly influence the simulation through atmospheric forcing. This issue may be crucial in complex terrains.

Seasonality and Climate Regions
Many areas of the domain are strongly influenced by the seasonal variation of precipitation [89]. Thus, for example, precipitation in the Ebro Basin falls mainly in spring and autumn, with summer and winter generally registering the minimum rainfall. Conversely, precipitation along the southern face of the Pyrenees is homogeneously distributed across all seasons (Figure 2).
The seasonal analysis of extreme precipitation is important not only in terms of occurrence but also for the synoptic conditions that produce them. Most of the extreme events in our study were recorded during the fall (22), followed by winter, spring, and summer (15, 11, and 7 cases, respectively). During spring, summer, and autumn, a 500-hPa trough and a strong insolation are usually responsible for extreme convective precipitation [55]. Conversely, in winter, northwest flow predominates, favoring persistent precipitation in the headwaters of the Pyrenees (see Supplementary Material S3.1 for an overview of precipitation in all cases). Figure 9 (left) compares the seasonal distribution of the PDFs between gauges, ERA5-L, and IMERG products (see S4 for a case-by-case comparison). In general, IMERG-F correctly reproduced the distribution of precipitation in all seasons, especially at the top end of the distribution, where reanalysis underestimated heavy rainfall rates and the early and late products overestimated them. Pixel-by-pixel comparison (Figure 9 right) shows that IMERG-E and IMERG-L yielded the poorest correlations. The bias correction in IMERG-F clearly improved the results of satellite data, obtaining similar (SON) and better (JJA) correlation coefficients relative to ERA5-L. However, reanalysis was still better in MAM and DJF, during which intense convective activity is less common. Extreme precipitation patterns are diverse, owing to the strong spatial heterogeneity of the study area. Thus, for instance, the Ebro Valley is prone to severe hailstorms caused by deep convection [90]. Conversely, orographic enhancement plays a key role in extreme precipitation over the Pyrenees [91]. Therefore, an accurate evaluation of the product should include climate zones. Figure 10 shows heat maps of the correlation (CC) ordered by season and climate region (cold: Dfb, Dfc; temperate: Cfa, Cfb; Mediterranean: Csa, Csb). In general, IMERG-E obtained the poorest score, followed by IMERG-L. The gauge adjustment of IMERG-F greatly increased the correlation with the reference data. For cold climates (Pyrenees), reanalysis had a stronger correlation than IMERG-F. A likely explanation for the shortcomings of IMERG is the underestimation of precipitation Extreme precipitation patterns are diverse, owing to the strong spatial heterogeneity of the study area. Thus, for instance, the Ebro Valley is prone to severe hailstorms caused by deep convection [90]. Conversely, orographic enhancement plays a key role in extreme precipitation over the Pyrenees [91]. Therefore, an accurate evaluation of the product should include climate zones. Figure 10 shows heat maps of the correlation (CC) ordered by season and climate region (cold: Dfb, Dfc; temperate: Cfa, Cfb; Mediterranean: Csa, Csb). In general, IMERG-E obtained the poorest score, followed by IMERG-L. The gauge adjustment of IMERG-F greatly increased the correlation with the reference data. For cold climates (Pyrenees), reanalysis had a stronger correlation than IMERG-F. A likely explanation for the shortcomings of IMERG is the underestimation of precipitation over the mountains, especially in winter when there is snow or ice on the surface (see cases 5, 10, 37, and 46 of the Supplementary Material S3.1, S3.4, and S3.5). In such circumstances, PMW sensors have difficulty discriminating rain from no rain [12]. This issue is consistent with earlier studies of IMERG [38,80,92]. However, precipitation estimates in complex terrain are also a difficult task for in situ observations. Thus, the use of gauge-derived datasets as ground truth should be taken with caution [2,63,93].
Remote Sens. 2020, 12, x FOR PEER REVIEW 14 of 20 over the mountains, especially in winter when there is snow or ice on the surface (see cases 5, 10, 37, and 46 of the Supplementary Material S3.1, S3.4, and S3.5). In such circumstances, PMW sensors have difficulty discriminating rain from no rain [12]. This issue is consistent with earlier studies of IMERG [38,80,92]. However, precipitation estimates in complex terrain are also a difficult task for in situ observations. Thus, the use of gauge-derived datasets as ground truth should be taken with caution [2,63,93]. Figure 10. Heat maps of the correlation between gauges, IMERG products, and ERA5-L, ordered according to season and climate region. We assumed that the center of EPE includes maximum rainfall pixels and defines the climate region that is most affected by each EPE. N/A means no cases. Climate regions are grouped in three categories: Cold climates (Dfc and Dfb), temperate climates (Cfa, Cfb), and Mediterranean climates (Csa and Csb).
Interestingly, IMERG-F attained a greater correlation than the reanalysis in JJA for temperate and Mediterranean climates, as well as in SON for the latter. There was a strong spatial gradient of precipitation in most of these cases (Figure 4, second to fifth columns). Therefore, we suggest that IMERG-F has better skills in capturing the sharp gradients of severe summer/autumn convective storms (see S3.1, S3.4, S3.5, and S4 for a case-by-case comparison).

Conclusions
Recent satellite-based precipitation products, such as IMERG, are becoming increasingly attractive because of the ability to represent the strong spatiotemporal variability of precipitation. For this reason, they can be an effective alternative to interpolated fields of rain-gauge measurements for validating numerical models, especially in areas poorly covered by rain gauge networks. The objective of the present study was to evaluate the performance of IMERG products in the representation of EPEs over an orographically complex region of the Iberian Peninsula. For this assessment, we used the R99p index, which identified 55 EPEs in the 2008-2018 period. Results were validated against a gridded precipitation dataset derived from a high-density rain gauge network. Figure 10. Heat maps of the correlation between gauges, IMERG products, and ERA5-L, ordered according to season and climate region. We assumed that the center of EPE includes maximum rainfall pixels and defines the climate region that is most affected by each EPE. N/A means no cases. Climate regions are grouped in three categories: Cold climates (Dfc and Dfb), temperate climates (Cfa, Cfb), and Mediterranean climates (Csa and Csb).
Interestingly, IMERG-F attained a greater correlation than the reanalysis in JJA for temperate and Mediterranean climates, as well as in SON for the latter. There was a strong spatial gradient of precipitation in most of these cases (Figure 4, second to fifth columns). Therefore, we suggest that IMERG-F has better skills in capturing the sharp gradients of severe summer/autumn convective storms (see S3.1, S3.4, S3.5, and S4 for a case-by-case comparison).

Conclusions
Recent satellite-based precipitation products, such as IMERG, are becoming increasingly attractive because of the ability to represent the strong spatiotemporal variability of precipitation. For this reason, they can be an effective alternative to interpolated fields of rain-gauge measurements for validating numerical models, especially in areas poorly covered by rain gauge networks. The objective of the present study was to evaluate the performance of IMERG products in the representation of EPEs over an orographically complex region of the Iberian Peninsula. For this assessment, we used the R99p index, which identified 55 EPEs in the 2008-2018 period. Results were validated against a gridded precipitation dataset derived from a high-density rain gauge network.
Results showed that EPEs were well captured by satellite products and reanalysis data. Overall, ERA5-L and IMERG-F obtained the best scores, but IMERG-E provides a better representation of the maximum rainfall pixels (location and intensity). However, IMERG-E and IMERG-L are less useful for a complete evaluation of EPEs. Reanalysis was superior regarding the location and spatial distribution of precipitation but failed in the intensity analysis. In this direction, ERA5-L might be negatively influenced by the coarse resolution of atmospheric forcings provided by ERA5 (~31 km). Conversely, IMERG was more accurate for precipitation maxima detection and the representation of sharp spatial gradients of precipitation. Climate regions and seasonal variability may also be important in the performance of the products. Thus, reanalysis had a greater correlation in winter and cold climate zones, whereas IMERG-F was better in summer and autumn for the temperate and Mediterranean climate zones. These results suggest that ERA5-L is more accurate for dynamically forced precipitation detection while IMERG is the best option for convective activity. Our main recommendation is therefore to select one or the other according to the parameter of interest, type of precipitation, and climate zone to be evaluated. Thus, for example, IMERG gives a more realistic representation of the typical summertime convective storm in the Ebro valley whereas ERA5-L performs the best in the Pyrenees in wintertime. For a general purpose, reanalysis may be a good option.
The application of this research can be useful for several purposes. First, it could help algorithm developers track issues related to EPEs. Second, it would help in determining the strengths and weaknesses of various precipitation products. Third, it might be useful for modelers performing sensitivity tests of extreme precipitation in a more detailed way, revealing potential issues in their dynamical cores and parameterizations. In this sense, state-of-the-art GCMs are now working at a 0.5 • /0.25 • spatial resolution (e.g., HighResMIP GCMs; [94]), facilitating better-resolved processes and fine parameterizations that must be validated with high-quality observational data.