1. Introduction
Evidence of climate change is present globally [
1]. Increases in surface temperature are greatly affecting the hydrological cycle from the local to the regional and global scale, ultimately leading to increased intensity and frequency of precipitation [
2]. This increases the risk of flooding, which is the most frequent type of natural disaster and can substantially damage the affected areas [
3]. The best way to mitigate damages from flooding is through adaptive measures that increase the resilience of current infrastructure to extreme events [
4]. Gaining a solid understanding of how different volumes of precipitation will affect different areas together with a reliable method of predicting floods is crucial to increase community resilience to extreme hydroclimatic events. Climate change has rendered stationarity moot, so traditional prediction models may no longer be reliable or valid, meaning suddenly, infrastructure may no longer be resilient to current and future storm events [
1].
Some areas with typically arid climate, such as Central Asia, are getting larger volumes of rain more frequently [
5]. Compared to more temperate areas, the amount of increased rainfall may seem low or insignificant. However, even a small increase in precipitation, (especially if received during a limited amount of time, can have catastrophic effects on such regions [
5,
6]. As these areas receive increased precipitation or a higher frequency of extreme precipitation events, there are adverse effects on resource management, infrastructure, and livelihood. Such effects may include destruction (and/or disruption) of housing, roads, and resource management equipment. Other consequences of these changing precipitation patterns include too much water during some parts of the year and too little during others, which can affect the availability of water year-round. In Central Asia, there is a limited rainy season that serves as the primary source of water for the region. So, any changes to the volume and frequency of rainfall will significantly affect those who live there [
5]. This area of Asia relies on a consistent wet season for their water supply, and if precipitation occurs outside of that time window, it causes runoff from snowy mountains to occur sooner than usual, which then shortens the wet season. Shortening the wet season ultimately shortens the growing and harvesting seasons for crops by decreasing the amount and availability of water, thereby limiting the crop yield in the region and impacting food availability for consumption or commerce [
5].
Another arid area that is experiencing increased rainfall is Southern California in the United States. In August 2023, Hurricane Hilary devastated Southern California, which received between 102 mm and 153 mm of rain within three days. This may not seem significant compared to what is typically observed in other, temperate areas of the country, but it exceeded the daily and monthly records for the area, and caused significant damages to infrastructure including buildings, houses, and roads [
7]. This area is not acclimated to receiving such large quantities of precipitation in such a short period of time, therefore meaning their infrastructure is not built to withstand these localized extreme events. This will only get worse as these types of events become more common.
Since so little difference in rainfall can have such a large impact in dry areas, it is even more important to minimize errors in precipitation estimates used in flood forecasting models in such regions. Accurate flood prediction models are fundamental for engineers and planners when building new infrastructure and planning management actions and the most critical input to such models is precipitation. Furthermore, precipitation measurements are used for an array of applications, including reservoir operations, land development, prevention of extreme hydroclimatic events (e.g., floods, landslides), weather and climate forecasting, and disease control [
8,
9]. However, an accurate measure of precipitation is crucial to effectively use such products in the applications listed above [
10].
Precipitation is commonly measured by in situ gauges, weather radars, satellites, and re-analysis models. Ground-based instruments, including rain gauges and weather radars, are widely used for measuring precipitation [
11]. Rain gauges provide high temporal frequency but are prone to errors from wind effects and evaporation [
12]. Radar networks provide continuous coverage with high spatial and temporal resolution at regional scales. However, radar-based measurements are affected by errors due to various issues such as surface backscatter contamination, attenuation of the signal, and uncertainty of the reflectivity–rain-rate relationship [
13,
14,
15].
Continuous and near-real-time coverage of the Earth can only be recorded with satellite precipitation sensors. The most accurate satellite precipitation estimates are from a combination of infrared (IR) sensors on geostationary satellites, characterized by high sampling frequency, and passive microwave (PMW) sensors on low-Earth-orbiting satellites with less-frequent sampling [
16]. Unlike PMW sensors that collect data of emissions and scattering signals of raindrops, snow, and ice contents, IR data measure cloud-top temperatures and cloud heights [
17].
Past efforts have evaluated and utilized satellite-based observations in a suite of hydrologic applications [
18,
19,
20,
21,
22]. A few focused on dry climate areas. For example, Morin et al. (2020) analyzed precipitation climatology from satellite observations in dry regions of the world and concluded that these areas are characterized not only by lower annual precipitation and higher variability, but also by fewer rainy days, a more pronounced extreme tail in the precipitation distribution, a smaller proportion of the area experiencing rainfall, and shorter spatial correlation distances [
23]. The study by Serrat-Capdevila et al. (2016) assessed three satellite precipitation products over Africa and found that their performance in dry regions was generally weaker due to infrequent and localized rainfall [
24]. However, after applying a bias correction, their accuracy improved significantly. Nozarpour et al. (2024) performed an assessment of satellite precipitation estimation products over Iran, i.e., Integrated Multi-satellite Retrievals for GPM (IMERG-V6), Multi-Source Weighted-Ensemble Precipitation (MSWEP), Tropical Rainfall Measuring Mission (TRMM) Multi-Satellite Precipitation Analysis (TMPA-3B43V7), and Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks—Climate Data Record (PERSIANN-CDR) [
25]. Their study found that all products consistently had fewer errors in regions of Iran with lower precipitation rates. Another study validated remote sensing precipitation products in southern Spain by comparing them to measurements from ground stations [
26]. They also developed a methodology to identify extreme rainfall and drought events over the past 30 years using satellite-derived data. Furthermore, Vernimmen et al. (2012) found that satellite rainfall products underestimate dry season rainfall in Indonesia, with TMPA-3B42 (near real time version) performing better than others [
27]. Another recent work that evaluated three high-resolution satellite-based precipitation products over different Chinese basins showed that, while all perform well for monthly precipitation, their accuracy declines in dry/arid regions, especially when estimating extreme precipitation [
28].
Reanalysis precipitation products are obtained by combining observational data, satellite measurements, and numerical weather prediction models, which are then processed to create a continuous and consistent time series. Reanalysis products typically cover the entire globe or large regional areas and span several decades, often from the mid-20th century to the present, making them particularly valuable for understanding long-term trends, variability, and extremes in precipitation patterns [
18,
29,
30]. Past work has shown how the performance of such products varies by region and precipitation intensity and they often struggle to detect daily precipitation events in arid zones, underestimates moderate-to-heavy rainfall, and overestimates light rainfall [
31,
32]. For instance, a study using re-analysis data during 1979–2018 showed that global drylands experienced a significant overall decrease in precipitation, though some regions—southern Africa, Australia, northern Africa, and South Asia—saw increases in summer rainfall [
33]. A work by Lavers et al. [
34] evaluated re-analysis precipitation (ERA5) data against 5637 weather stations worldwide (2001–2020) and found that while ERA5 captures broad spatial patterns and monthly variability well in extratropical regions, it exhibits significant wet bias, low correlations, and large errors in tropical and dryland areas. It also underestimates extreme daily rainfall totals, meaning that precipitation trends, dry spells, and drought frequency in arid regions may be inaccurately represented [
34].
This study investigates the performance of a suite of precipitation products from both satellite retrievals and models in a dry-climate region, where rain events that are more intense than usual may cause significant damage. Specifically, this study analyses four datasets (one satellite and three re-analysis products) and compares them to ground-recorded observations to determine which sources are more accurate and where improvements should be directed. Palm Desert in Southern California, a historically dry climate region, is chosen as the study area from 2000 through 2019. The study answers the following overall research question: What is the performance of different precipitation products in a dry-climate region? More specifically, what is their ability to estimate (1) the magnitude of average precipitation; (2) the magnitude of extreme precipitation events; (3) the occurrence of precipitation overall; and (4) the occurrence of extreme events? The methodological framework used to answer these four questions is presented in
Section 2 and includes a description of the study area, the five datasets adopted in this work, and the statistical analysis. Results are illustrated and discussed in
Section 3, whereas conclusions are drawn in
Section 4.
3. Results
The time series of precipitation events recorded by each dataset during the study period (from 5 June 2000 to 31 December 2019) are illustrated in
Figure 3. The peaks observed in the time series are clearly identifiable across all datasets. These events exhibit strong temporal alignment, indicating consistent detection of rainfall occurrences across the observational and estimation products. However, discrepancies in peak magnitudes are evident, reflecting systematic biases wherein certain estimation products either overestimate or underestimate the actual precipitation intensities associated with these events.
A more detailed investigation of these discrepancies is conducted through quantitative analyses aimed at characterizing the deviations in reported precipitation across datasets. These analyses facilitate a rigorous evaluation of each product’s performance, enabling identification of the temporal and contextual conditions under which discrepancies are most pronounced.
An initial step in diagnosing the discrepancies among estimation products involved isolating two large precipitation events within the time series and studying their evolution in time.
Figure 4 shows time series for two specific events: 29 December 2004 and 14 February 2019. In the first event (
Figure 4a), the rain gauges reported the rainfall event starting on 28 December 2004, continuing to the next day at a lower rate, and having concluded by 30 December 2004. IMERG, MERRA2, and WLDAS capture the event timing, although WLDAS underestimates the amount of rainfall on both days of the event. ERA5-MAX and ERA5-MIN show a delay in the event detection, with the peak occurring on the second day rather than the first. ERA5-MAX overestimates the precipitation rate, whereas ERA5-MIN appears to report values much closer to the reference rainfall. In summary, if IMERG, MERRA2, and WLDAS are better at estimating the timing of this event, IMERG, MERRA2, and ERA5-MIN are better at estimating its peak magnitude. The second event (
Figure 4b) seems to have perfect timing across all estimation products with a peak on 14 February 2019. However, the magnitude of such peak varies across datasets. Specifically, ERA5-MIN and MERRA2 underestimated the peak magnitude, while ERA5-MAX, WLDAS, and IMERG overestimated it. Across the two events (
Figure 4), IMERG and MERRA2 consistently correctly reported the timing of precipitation, and ERA5-MIN and MERRA2 reported similar magnitudes of precipitation to the reference. This could indicate a good performance of IMERG relative to the other products at both estimating the timing and magnitude of precipitation events across the study area, although further investigation is required.
The scatterplots for each estimation product versus the reference dataset are presented in
Figure 5. The linear relationship between each product and the reference product is positive, although not very strong, indicating there may be room for improvement in each product. Broad scatter in the data, often accompanied by low correlation values, is common when comparing ground-based precipitation observations—whether from gauges or weather radar—to satellite-derived or model-based precipitation estimates. This behavior has been widely documented in previous studies, reflecting differences in spatial resolution, sampling strategies, and measurement uncertainties across observing platforms (e.g., [
50,
51,
52,
53]).
Six percentiles were computed for each product for precipitation rates greater than 0.1 mm/day (
Table 4). Percentiles of each estimation product were generally similar to those of the reference dataset. ERA5-MAX values are higher than the reference ones, which is expected given that this dataset provides maximum daily precipitation. WLDAS was consistently the closest in value at each percentile, with values that almost matched those of the reference dataset. The CDF plots in
Figure 6 provide a visualization of these percentile values and how that compare to one another.
In presenting and discussing our results, we define extreme precipitation as values exceeding the 90th percentile, with particular attention given to more intense events above the 95th and 99th percentiles. The CDF of the reference dataset illustrates that 90% of the daily precipitation values recorded were 7.17 mm/day or less. IMERG reported only 4.27 mm/day at the 90th percentile. However, it was almost the same value as the reference dataset in the 99th percentile with 27.41 mm/day or less reported (
Table 4). This indicates that the average amount of precipitation being reported by IMERG is less than the actual occurrence, but the amount of precipitation being reported for extreme events is more likely to be accurate. Something similar could be stated for ERA5-MIN and MERRA2, although these percentile values are much closer to the reference data at the 90th percentile, and ERA5-MIN is much closer to the reference dataset value at the 99th percentile than MERRA2. WLDAS is very close to the reference at both the 90th and 99th percentiles, indicating that this estimation product may be accurately reporting precipitation values during both typical and extreme events. ERA5-MAX produces larger estimates of precipitation at all percentiles, which is expected given the nature of this product to estimate higher rainfall.
The overall ratio of “wet” days, defined as any day that received more than 0.1 mm of rain in one day, is analyzed in
Figure 7.
Across all products, the proportion of wet days—defined as days with measurable precipitation—remained within the 10% to 40% range of total annual days. with the reference showing a ratio of 19.5% wet days over the entire study period (number of days that recorded a value larger than 0.1 mm/day divided by the total number of days in the time series). ERA5-MAX reported 37.0% of the total time series as wet days, ERA5-MIN reported 17.3%, MERRA2 reported 26.2%, WLDAS reported 17.7%, and IMERG reported 31.8% of the total days reported as wet days. Among these, ERA5-MIN and WLDAS exhibited the highest concordance with the reference dataset, with the closest wet day percentages. The remaining products were still close, all less than 20% higher. However, given the high sensitivity of the study region to precipitation, even minor deviations in the frequency or detection of wet days among the products may carry significant implications.
3.1. Continuous Statistics
First, the overall bias ratio was computed for each estimated dataset with respect to the reference (
Figure 8a). Then, a relative bias ratio was computed at three different thresholds, i.e., estimation rain rate higher than the 75th, 90th, and 95th percentiles (refer to
Table 5 for percentile values). The bias ratio exhibits a consistent trend across all datasets, deteriorating progressively as the threshold increases. Among the datasets analyzed, ERA5-MAX exhibits the highest bias (greatest deviation from unity), whereas WLDAS and ERA5-MIN show the most favorable bias ratios. MERRA2 and IMERG present similar biases with a nearly linear increase with increasing threshold. They are similar in value to the biases of WLDAS and ERA5-MIN, although they deviate further from unity as the threshold increases. Insights gained could potentially inform enhancements to bias correction strategies that are rain rate dependent.
The behavior of the mean error closely mirrors that of the bias ratio, reflecting their inherent similarity (
Figure 8b). However, unlike the bias ratio, the mean error quantifies the magnitude of rainfall misestimation, providing a more direct measure of the error in precipitation amounts. The mean error (for all non-zero values) remains near zero across all datasets except for ERA5-MAX, which, once again, is expected given the nature of this dataset. However, as the percentile threshold increases, the ability of the products in capturing rainfall magnitudes observed by the gauges declines notably. For extreme precipitation events (e.g., the 95th percentile), ERA5-MAX exhibits mean errors reaching up to 20 mm/day. MERRA2 and IMERG show errors between 7 mm/day and 14 mm/day for the higher thresholds (90th and 95th percentiles). These error magnitudes are substantial, particularly in the context of the arid climate region examined in this study.
While the bias ratio and mean error discussed above mainly reveal whether each product tends to over- or under-estimate rainfall on average, they can hide large errors because such overestimates and underestimates cancel each other out. RMSE instead reflects both the size and variability of these errors, providing information on how accurately the products capture actual rainfall volumes, especially during heavy-rain events (
Figure 8c). ERA5-MAX still exhibits the largest rainfall estimation errors, but the separation between ERA5-MAX and the IMERG and MERRA-2 products is less pronounced in terms of RMSE than it is for bias ratio or mean error. RMSE indicates that ERA5-MAX’s performance, while still the poorest, is more comparable to the other products when considering total error magnitude rather than directional bias alone.
Figure 9 presents Pearson’s correlation coefficient for each dataset. A general decline in correlation is observed as the percentile threshold increases. Notably, ERA5-MAX consistently exhibits the highest correlation values. This indicates that, despite the discrepancies in rainfall amounts discussed above, ERA5-MAX aligns most closely with the temporal pattern of rainfall observed in the reference dataset. In contrast, MERRA2 exhibits the weakest correlation with ground-based observations, which may be attributed to its relatively coarse spatial resolution.
3.2. Contingency Metrics
Figure 10a illustrates the probability of detection relative to the estimated precipitation threshold, computed based on
Table 2. ERA5-MAX exhibits the highest POD, with a probability of detection of 95% or higher when the threshold is 2 mm/day or greater, indicating strong performance in identifying rainfall events. While MERRA2 records a relatively low POD of 70% at the minimal threshold of 0.1 mm/day, it exhibits improved detection capabilities at higher thresholds, supporting its effectiveness in capturing extreme precipitation events in arid regions. In contrast, WLDAS consistently yields lower POD values across all thresholds. Despite its ability to closely replicate the overall distribution of rainfall, WLDAS appears limited in accurately detecting the timing of rainfall events, suggesting a deficiency in temporal precision.
Figure 10b illustrates false alarm ratios calculated based on the contingency matrix shown in
Table 3 as a function of different reference precipitation thresholds. All datasets exhibit a similar decreasing trend in FAR, with ERA5-MIN consistently achieving the lowest values. Conversely, ERA5-MAX shows the highest FAR, indicating a greater tendency to report rainfall when none occurred. These results are expected given that the two products offer a minimum and maximum rainfall estimate during the day.
This is particularly important for the reliability of early warning systems, which depend heavily on accurate rainfall detection to issue timely alerts for potential flooding or other hydrometeorological hazards. A low FAR minimizes the risk of false alarms, which can erode public trust and lead to reduced responsiveness over time. Thus, the consistently low FAR across these datasets enhances their suitability for operational use in early warning and disaster preparedness frameworks, particularly in regions where rainfall is infrequent but can have significant impacts.
The analysis of missed precipitation presented in
Figure 11a reveals that all estimation products exhibit a similar decreasing trend as the threshold increases, as expected. WLDAS consistently shows a higher proportion of missed precipitation compared to the other datasets, which are much closer to one another. While previous findings indicated that WLDAS was among the most precise in replicating the overall rainfall distribution, it was also noted that its temporal accuracy—specifically, the correct timing of rainfall events—was likely the poorest. The elevated missed precipitation ratios observed for WLDAS in
Figure 11a corroborate this finding, indicating a consistent under detection of rainfall events.
Missed precipitation can be particularly problematic in arid and semi-arid regions, even at low rainfall intensities, due to the critical role that every precipitation event plays in these water-scarce environments. In such regions, rainfall events are infrequent and highly variable, and even small amounts can have significant ecological, agricultural, and hydrological impacts. Missing these events can lead to underestimation of available water resources, misinformed drought assessments, and inadequate planning for water supply and agricultural management. Furthermore, missed precipitation can compromise the effectiveness of hydrological models and early warning systems, which rely on accurate detection of rainfall to forecast runoff, soil moisture, and potential flood or drought conditions. Inaccurate representation of precipitation events can thus exacerbate the vulnerability of communities and ecosystems already under stress from limited water availability.
Similarly, the falsely detected precipitation graph in
Figure 11b shows an overall decreasing trend, with WLDAS seemingly set apart from the others. The missed and false precipitation ratio values illustrate that ultimately WLDAS was missing the highest volume of precipitation (although the FAR was relatively low) and reported more precipitation than what was detected by the reference product. As the reference rain threshold increases to represent more extreme rain events, the volume of rain missed or falsely detected decreases.
However, when looking at ETS, WLDAS is outperformed only by ERA5-MIN, which exhibits the highest score among all products, with values very similar to those of MERRA2 (
Figure 12a). As noted in the methodology, ETS decreases (by definition) as events become rarer—that is, when higher reference precipitation thresholds are used. TSS, shown in
Figure 12b, balances the ability to correctly detect both events and nonevents, making it less sensitive to event frequency and offering a robust assessment of overall forecast skill. The different precipitation products exhibit a wide range of TTS values, with ERA5-MIN presenting the highest scores, followed by MERRA2. This variation indicates that some products consistently capture event occurrences better than others, while some produce more false alarms or miss events, as also shown in
Figure 10 and
Figure 11. The differences in TTS (and the other scores) can also arise from the different spatial and temporal resolutions and the inherently different source of each product (satellite, model, in situ). Consequently, even when overall trends in rainfall are similarly represented, the skill of individual products in detecting specific events—especially extreme or localized rainfall (towards higher values of the x-axis of
Figure 12a—can differ. This wide spread in TTS highlights the importance of carefully evaluating and selecting products for operational use or scientific studies, rather than assuming that all datasets perform equivalently across different rainfall intensities and event types.
3.3. Dataset Ranking
To provide a high-level glance at how each estimation product performed and compared to one another, a ranking system was used to assist in answering each research question. The rankings are listed in
Table 5,
Table 6,
Table 7 and
Table 8. A simple system of assigning each product a number, 1 through 5, based on which product had the best (one) and worst (five) results compared to the other products.
To answer the first research question, i.e., how well different products estimate average precipitation in a dry-climate region, the following metrics were considered: CDF, overall bias ratio, overall mean error, overall RMSE, and overall correlation coefficient. Overall WLDAS ranks best with all other products performing similar to each other. For the CDF, the product that presented the closest 50th quantile to the one of the reference dataset was ranked first. Although WLDAS and ERA5-MAX rank high for CDF, ERA5-MAX is characterized by a large positive bias. While ERA5-MAX and WLDAS differed from the reference average by 0.10 mm/day or less (about 10% of the reference dataset average), the remaining three estimation products differed by at least 0.30 mm/day (about 30% of the reference dataset average). This is just to emphasize that the difference in statistical results between products is not necessarily clearly illustrated by the rankings, and the actual results must still be taken into consideration while evaluating a product’s performance. Another difference which is not clear from the rankings is that all bias ratios (and mean errors) were very close to one another, except for ERA5-MAX, which was significantly higher. Furthermore, ERA5-MAX, IMERG, and MERRA2 showed comparable RMSEs, whereas ERA5-MIN and WLDAS are characterized by smaller RMSEs. The correlation coefficient has larger variability and is generally well represented by the ranking.
To answer the second question, i.e., the performance of precipitation products during extreme precipitation events, the same metrics showed in
Table 5 were used, but at the 95th percentile instead (
Table 6). As with the findings for average precipitation, WLDAS ranks the highest among the five products, exhibiting the lowest bias ratio, mean error, and RMSE as well as the second-best 95th percentile (i.e., closest to the one of the ground reference) and correlation coefficient. Nevertheless, in this case, ERA5-MAX comes last, followed by MERRA2, and IMERG. This is due to the fact that ERA5-MAX is characterized by the largest errors (bias ratio, mean error, and RMSE). The issue is not that ERA5-MAX fails to generate extreme rainfall events; rather, it produces them with excessive magnitude, and the evaluation metrics penalize this overestimation.
At the 95th percentile, cumulative precipitation rates were comparable among the reference dataset, WLDAS, and ERA5-MIN (as shown in
Table 4). In contrast, MERRA2 and IMERG tended to underestimate high rainfall rates, while ERA5-MAX significantly overestimated them. The bias ratio for events at or above the 95th percentile was closest to the ideal value of 1 for both WLDAS and ERA5-MIN. IMERG and MERRA2 both presented bias ratios exceeding 2, while ERA5-MAX had a bias ratio approaching 3. A similar pattern emerged for mean error and RMSE, with WLDAS and ERA5-MIN nearly tied for the most accurate estimates. These were followed by MERRA2 and IMERG, and finally ERA5-MAX, which had the largest deviation from the ideal. Nevertheless, the correlation coefficient told a different story: ERA5-MAX had the highest correlation with the reference data in the case of rare events, followed—though more distantly—by WLDAS and IMERG. This marked a shift from the rankings observed at the 50th percentile, largely due to a notable drop in ERA5-MIN’s correlation for extreme precipitation events. Overall, these results suggest that ERA5-MIN may be more reliable for estimating average precipitation than for capturing extreme rainfall events, a conclusion supported by its performance across both average and high-intensity precipitation metrics.
The third research question focused on the capabilities of the different products to detect overall precipitation. The ranking of the datasets utilized probability of detection, false alarm rate, missed precipitation fraction, falsely detected precipitation fraction, ETS, and TSS (
Table 7). Similar to research objective 1, the values of these metrics were taken for the overall precipitation (larger than 0.1 mm/day).
For the percentage of wet days—defined as days receiving at least 0.1 mm of rainfall—the ranking was determined based on the overall proportion of such days throughout the study period. WLDAS and ERA5-MIN aligned most closely with the reference dataset, followed—though less closely—by MERRA2, IMERG, and ERA5-MAX. ERA5-MAX recorded the highest average probability of detection, outperforming IMERG and MERRA2 by over 10%. In contrast, ERA5-MIN and WLDAS trailed those two products by an additional 10%. Missed precipitation rates were comparable among ERA5-MIN, IMERG, and MERRA2, with ERA5-MAX performing slightly better and WLDAS showing a higher rate than all others. IMERG, ERA5-MAX, and MERRA2 exhibited very similar values of falsely detected precipitation. ERA5-MIN had a noticeably lower rate, though not to the extent of WLDAS, which again stood out with a substantially higher rate than the rest. ETS and TSS offer a more holistic view of the previously discussed metrics, with ERA5-MIN presenting the highest ETS and TSS. Taken together, these findings suggest that ERA5-MIN performs particularly well, whereas WLDAS and IMERG are the least advisable options when accurate detection of overall precipitation is a key criterion.
The analysis of the capability of the various products to detect extreme precipitation rates is based on the same metrics shown in
Table 7, but using values computed for the 95th percentile (
Table 8). In response to objective 4, ERA5-MIN ranks first, followed by IMERG, whereas WLDAS presents the lowest ranking. The probability of detection for extreme precipitation events increases significantly across all estimation products—except for WLDAS, which remains below 90% even at thresholds as high as 20 mm/day. This suggests that, although WLDAS performs reasonably well in estimating daily rainfall amounts, it is unreliable in detecting the timing or occurrence of rainfall, particularly during extreme events. The false alarm rate for extreme events was consistently similar across all products, showing no major outliers. However, both missed precipitation and false precipitation followed a pattern similar to POD: values were comparable among all products except WLDAS, which exhibited significantly higher errors in both categories. In terms of ETS and TSS, ERA5-MIN once again ranks the highest, whereas ERA5-MAX presents the poorest performance. These findings highlight a notable weakness in WLDAS—its limited ability to accurately detect the presence of rainfall, especially during high-intensity events.
4. Conclusions
Climate change is driving increasingly dramatic shifts in weather patterns across the globe. One notable example is the rising frequency, intensity, and duration of precipitation events in the southwestern United States, a region traditionally characterized by arid and semi-arid conditions. These extreme rainfall events, which historically occurred less than once in a century, are now becoming more common. As a result, existing infrastructure—designed for much drier conditions—is often overwhelmed and prone to failure, highlighting the urgent need for climate-resilient planning and adaptation strategies.
This study answered four research questions posed in
Section 1:
What is the ability of different precipitation products to estimate the magnitude of average precipitation locally in a dry climate region?
The assessment of average precipitation estimates in Palm Desert, California shows that, although the ranking system provides a general overview, it can obscure important differences among products. For instance, ERA5-MAX appears strong because its 50th percentile aligns closely with the reference dataset, yet it also exhibits a substantial positive bias. WLDAS and ERA5-MAX differ from the reference mean by no more than 10%, whereas the other products deviate by at least 30%. ERA5-MAX, IMERG, and MERRA-2 show comparable RMSEs, while ERA5-MIN and WLDAS are characterized by smaller errors. Correlation coefficients display larger variability and are generally reflected well in the ranking, with ERA5-MAX showing the best alignment with the CPC dataset. However, matching a median or average does not guarantee accurate representation of full precipitation distribution, and the inherent smoothing, algorithmic assumptions, and averaging in gridded products are the main cause of biases and errors.
What is the ability of different precipitation products to estimate the magnitude of extreme precipitation events in a dry area?
For rare precipitation events, many of the patterns seen for average precipitation persist, but some important differences emerge. ERA5-MAX shows the largest bias ratio, mean error, and RMSE, not because it fails to generate extreme events, but because it produces them with excessive magnitude. MERRA-2 and IMERG generally underestimate high rainfall rates, while WLDAS and ERA5-MIN yield 95th-percentile values closer to the reference dataset. In contrast, ERA5-MAX correlates most strongly with the reference data for extreme events than other products. These results highlight that products may perform differently under extreme versus average conditions and that rankings alone cannot fully represent those distinctions.
Differences in how precipitation products perform under rare, extreme events often stem from the coarse spatial resolution and inherent smoothing of grid-based reanalysis models and satellite observations. Such smoothing tends to blur and dilute intense, localized rainfall peaks, causing coarse products to underestimate extremes. However, when a product instead attempts to compensate, it can generate excessive magnitudes, inflating bias ratio, RMSE, and mean error.
What is the ability of different precipitation products to estimate the occurrence of overall precipitation in a dry climate region?
The overall ability of the five products to accurately detect the presence of rainfall was assessed using contingency metrics, including probability of detection, false alarm rate, missed precipitation fraction, and false precipitation fraction. Based on these metrics, ERA5-MAX shows the highest probability of detection, IMERG and MERRA-2 follow, and ERA5-MIN and WLDAS detect substantially fewer events. Missed-event rates are similar for ERA5-MIN, IMERG, and MERRA-2, slightly lower for ERA5-MAX, and highest for WLDAS. False-alarm rates cluster closely among IMERG, ERA5-MAX, and MERRA-2; ERA5-MIN is lower, and WLDAS is again noticeably higher. Composite scores (ETS and TSS) favor ERA5-MIN, indicating more balanced detection performance overall, while WLDAS and IMERG perform less reliably.
These differences likely arise from how each product handles low-intensity precipitation. Products such as ERA5-MAX and IMERG tend to generate more light rainfall events, boosting detection but also increasing false alarms. WLDAS appears particularly sensitive in this regard, triggering too many near-threshold events. In contrast, ERA5-MIN applies a more conservative thresholding or filtering of light precipitation, leading to fewer false detections and more accurate classification overall. Differences in model physics, sensor noise characteristics, and drizzle-handling schemes likely drive these systematic biases in wet-day identification.
What is the ability of different precipitation products to estimate the occurrence of extreme precipitation in a dry climate region?
When looking at the detection of extreme precipitation events (95th percentile), ERA5-MIN ranks first overall, followed by IMERG; by contrast, WLDAS ranks lowest. For most thresholds, detection increases for all products except WLDAS, which remains below 90% even at high thresholds (e.g., ≥20 mm/day). False-alarm rates are fairly similar among products. However, WLDAS shows significantly higher missed- and false- precipitation rates than the others. Composite skill scores (ETS and TSS) again favor ERA5-MIN, while ERA5-MAX performs worst. WLDAS appears relatively good at estimating average rainfall amounts, but notably weak at reliably detecting when extreme rainfall events occur. As mentioned above, global and/or coarse-grid datasets struggle with representing intense, localized rainfall events. As a result, some products miss the timing of extreme events (as in WLDAS), even if they approximate overall daily rainfall reasonably well.
In summary, some products, like ERA5-MIN, manage to strike a balance between detecting the occurrence of extreme rainfall and estimating its magnitude reasonably well. In contrast, WLDAS—while good at reproducing daily rainfall amounts overall—struggles to reliably capture the timing and occurrence of intense rainfall events. As a result, WLDAS often misses extreme events or records them at the wrong time, which degrades its probability of detection (POD), increases missed-event counts, and lowers skill scores (like ETS and TSS).
Part of the underlying cause is that precipitation extremes are often short-lived, highly localized, convective events. Global-scale reanalysis and coarse-gridded satellite products tend to smooth spatial and temporal variability, which often misses these sharp spikes in rainfall. Meanwhile, even when a product does detect an event (as with IMERG or ERA5-MAX), errors in estimating the intensity—either overestimating or underestimating—degrade the bias ratio, mean error, and RMSE. This can happen if the algorithms or physical parameterizations poorly represent convective processes or sub-grid rainfall dynamics.
Finally, the fact that some products (e.g., ERA5-MAX) still show reasonable correlation with reference data despite large magnitude errors reveals another truth: a product may get the timing of events roughly right (wet vs. dry days), even if it fails to reproduce the true rainfall amounts. Correlation reflects temporal alignment more than magnitude accuracy—so even a biased product can perform well on correlation even if its errors in intensity are large.
In addition to its predominantly dry climate, this region is characterized by highly complex terrain—a factor well known to challenge both the observation and modeling of precipitation. These topographic influences can introduce substantial uncertainty by shaping local atmospheric dynamics and limiting the ability of observational networks and models to accurately resolve spatial variability. When combined with the relatively coarse resolution of several of the datasets used here, these factors likely contribute to the discrepancies identified in our analysis. The influence of terrain–resolution interactions is evident in
Figure 2. Higher-resolution datasets capture precipitation enhancement on the windward slopes, reflecting their improved ability to represent orographic processes. In contrast, lower-resolution datasets—unable to adequately resolve the underlying terrain—fail to capture this signal. This mismatch underscores the importance of considering both topographic complexity and dataset resolution when interpreting precipitation estimates in mountainous, arid environments.
Future work should investigate the conclusions drawn above in different regions characterized by a similar climate to generalize the results presented in this study. Additional satellite-based products and re-analysis data should also be assessed together with ground radar observations, if available. Time series should also be extended to a longer temporal range.
The impact of bias correction techniques applied to each estimation product could also be considered, as they can significantly influence overall performance. Bias corrections are often implemented to align modeled or satellite-derived precipitation estimates with observed data, improving accuracy in magnitude and distribution. However, these adjustments can also introduce new uncertainties or mask underlying deficiencies in the original datasets. Evaluating how each product’s performance changes before and after bias correction can provide valuable insight into the true capabilities of the raw estimation models versus the effectiveness of the correction methods themselves. This distinction is especially important when comparing products across different climate regimes or event intensities, such as average versus extreme precipitation. In future analyses, incorporating a systematic comparison of bias-corrected versus uncorrected outputs could help clarify whether observed improvements are due to the model’s inherent skill or the strength of the correction algorithm applied.