1. Introduction
Precipitation is one of the most impactful hydrological phenomena, significantly impacting terrestrial ecosystems, socio-economic systems, and hydrological processes. Unlike other meteorological variables, precipitation manifests as intermittent events that exhibit spatial and temporal variability [
1,
2]. Rainfall intensity distributions govern key hydrological processes, including surface runoff generation, groundwater recharge, and river baseflow, thereby influencing water resource management and agricultural productivity. However, when rainfall intensities exceed infiltration and drainage capacities, particularly in arid and semi-arid environments, they can trigger floods and flash floods, causing severe damage to infrastructure, disrupting transportation networks, degrading agricultural land, and threatening human lives [
3,
4]. Quantitative precipitation estimation plays a crucial role in climate change studies and disaster mitigation efforts, particularly in guiding the safe design of water infrastructure [
5]. Accurate data on rainfall distribution at high resolutions are essential for predicting dynamic surface hydrologic states, yet achieving such precision remains a challenge due to the heterogeneous nature of precipitation in space and time [
1,
6].
Traditionally, rain gauge networks are reliable sources of direct precipitation measurements, providing point-scale observations essential for model calibration, validation, and forecasting [
7]. Despite their reliability, rain gauges are subject to errors arising from calibration issues, maintenance challenges, and wind effects, which may lead to underestimation of precipitation [
8,
9,
10,
11,
12]. Common issues include snow blowing, raindrop splashing, wetting losses, and evaporative losses [
13]. Additionally, rain gauges are limited by their spatial interpolation, which depends more on network geometry than on actual precipitation distribution [
5]. Sparse networks, particularly in tropical rainforests and mountainous regions, exacerbate these limitations [
14,
15]. Rain gauges in high-altitude areas often suffer from harsh climatic conditions, resulting in seriously biased measurements [
6]. While weather radar systems can provide an overview of rainfall distribution, spatially and temporally, radars are also hindered by challenges such as ground clutter and beam blockage, which could degrade data quality. Moreover, radar systems remain financially and technically inaccessible for many developing nations [
16,
17].
Satellite-based rainfall estimates have emerged as a promising alternative to ground-based observations, offering extensive spatial coverage and high temporal resolution [
18]. Since the 1970s, satellite observations have been utilized to extract global precipitation information, becoming indispensable for advancing hydrological, atmospheric, and climate sciences [
19,
20]. These products are effective substitutes in regions with sparse or unreliable rain gauges [
13]. In particular, satellite precipitation products (SPPs) provide valuable data for flood prediction, drought monitoring, and climate change research [
21,
22,
23].
The Global Precipitation Measurement (GPM) mission, an international collaboration led by NASA and the Japanese Aerospace Exploration Agency (JAXA), launched its Core Observatory satellite in 2014 to set a new standard in remotely sensed precipitation measurements [
17,
24]. The Integrated Multi-Satellite Retrievals for GPM (IMERG) dataset, a key product of this mission, offers global coverage with a high spatial resolution of 0.1° and a temporal resolution of 30 min [
24,
25]. IMERG provides three levels of products: near-real-time Early and Late-run products and a post-real-time Final product [
21]. Unlike earlier satellite missions, such as TRMM, GPM-IMERG employs advanced sensors to detect both light and heavy rainfall as well as snowfall. Due to its high accuracy, IMERG has been widely used in hydrological and meteorological applications [
8,
26,
27,
28,
29]. Another competitive satellite dataset called CMORPH was developed by the National Oceanic and Atmospheric Administration (NOAA) Climate Prediction Center dataset, utilizing infrared (IR) and low-orbit passive microwave (PMW) data to estimate rainfall [
30,
31]. In addition, as a part of the GPM program, the precipitation measurement team at JAXA, Japan, developed and released the GSMaP product that offers valuable precipitation estimates [
32]. The MSWEP (Multi-Source Weighted-Ensemble Precipitation) dataset is developed and maintained by GloH2O (Global Hydrology and Climate Solutions), offers global precipitation data with a 3-hourly temporal resolution spanning from 1979 to the near-present [
33]. MSWEP V2 represents the first fully global precipitation dataset with a spatial resolution of 0.1°, enabling hyper-resolution land-surface modeling worldwide. This dataset stands out by leveraging the complementary strengths of gauge-, satellite-, and reanalysis-based data to provide reliable precipitation estimates [
34,
35], Several studies in other regions, including arid regions like Saudi Arabia, have shown that several satellite precipitation products may pose errors due to multiple sensor-based or instrument-based discrepancies [
36,
37,
38,
39,
40,
41]
In the United Arab Emirates (UAE), some satellite precipitation products have been evaluated. Four datasets from the PERSIANN family were compared over ten years, from 2011 to 2020, in the UAE. PERSIANN-CDR was identified as the most reliable dataset for capturing extreme rainfall events and spatial distribution over the region [
42]. Trend analyses using IMERG v6 and CHIRPS (Climate Hazards Group InfraRed Precipitation with Station data) indicated an increase in total precipitation and consecutive wet days across three selected UAE watersheds [
43]. In addition, Baig et al. [
8] utilized CMORPH, PERSIAN, and IMERG datasets along with rain gauge data to investigate the 17-year rainfall trend analysis over the UAE. Other studies validated the applicability of IMERG and CMORPH for understanding rainfall variability, highlighting their potential to complement or replace ground-based measurements in the UAE [
26,
44].
The four promising satellite-based rainfall products employed in this study, GSMAP, IMERG, CMORPH, and MSWEP, are widely utilized worldwide due to their reliability and global applicability. However, to the best of our knowledge, no previous study has utilized the MSWEP dataset in the context of the United Arab Emirates (UAE). The primary goal of this study is to test and calibrate the newly developed MSWEP dataset over the UAE, a region with highly variable rainfall and critical water resource management needs. Using daily rainfall data from 38 rain gauge stations distributed across the UAE, this study evaluates the performance of MSWEP against three well-established satellite rainfall products: CMORPH, IMERG, and GSMaP. Despite its demonstrated strengths, MSWEP is not without limitations—particularly in regions with sparse or uneven gauge coverage, such as the UAE. Because MSWEP relies partly on gauge-based bias correction to merge satellite and reanalysis inputs, its accuracy can diminish in data-scarce areas where limited ground observations constrain the calibration process. The United Arab Emirates, the focus of this study, provides a unique testing ground for satellite precipitation products due to its arid climate and complex topography. The region experiences annual rainfall averaging less than 100 mm in most areas, with highly uneven distribution across seasons and locations [
36]. Rainfall events, although infrequent, are crucial for groundwater recharge and vegetation, making accurate rainfall estimation vital for sustainable water resource management and urban planning.
Despite the growing global adoption of MSWEP, its performance has been tested primarily in humid, monsoonal, or temperate contexts, with limited validation in hyper-arid environments. These regions pose unique retrieval challenges due to sparse precipitation, dominance of warm-cloud rainfall, low gauge density, and pronounced orographic influences. Consequently, a critical gap exists in understanding whether MSWEP’s multi-source merging framework is robust under hyper-arid dynamics. This study addresses this gap by providing a comprehensive validation of MSWEP over the UAE.
Although MSWEP has been evaluated at global and continental scales, its behavior in hyper-arid environments remains poorly constrained. Hyper-arid regions are characterized by strongly zero-inflated rainfall distributions, infrequent but intense convective events, rapid evaporation, and sparse gauge networks used for bias correction. These conditions introduce uncertainties in how multi-source precipitation products represent rainfall occurrence, intensity, and extremes. In particular, it is unclear how reliably MSWEP performs under extreme aridity, where precipitation is rare but hydrologically significant. Addressing this gap, the present study provides a comprehensive evaluation of MSWEP over the United Arab Emirates by benchmarking its performance against gauge observations and multiple satellite products across statistical, seasonal, and extreme-event dimensions.
This study addresses a critical research gap concerning the limited evaluation of multi-source precipitation products in hyper-arid environments, where sparse observation networks and localized convective events present substantial challenges for satellite retrieval accuracy. Building on this gap, we hypothesized that MSWEP—despite its advanced multi-source merging framework—would exhibit performance variability across different seasons and physiographic settings of the UAE, and that benchmarking it against CMORPH, IMERG, and GSMaP would reveal systematic strengths and limitations relevant to hydrological applications. By systematically assessing these four datasets using statistical, categorical, and extreme rainfall metrics, the study provides clear, application-oriented insights into their reliability for arid-region hydrology and climate analysis. The results not only contribute to the broader understanding of satellite precipitation skill in dry climates but also support the refinement of retrieval algorithms and inform future efforts in water-resource planning, hazard forecasting, and regional dataset calibration. Reference formatting has been standardized throughout the manuscript in accordance with journal guidelines.
Study Area
The United Arab Emirates (UAE) is situated in the southeastern part of the Arabian Peninsula, bordered by Saudi Arabia and Oman (
Figure 1). It is characterized by an arid climate with extreme temperatures and scarce precipitation. The country experiences highly variable and spatially heterogeneous rainfall patterns, with annual precipitation largely influenced by convective storms and orographic effects in the eastern mountainous regions. While most of the country receives minimal rainfall, localized areas, particularly in elevated regions, experience relatively higher precipitation due to topographical influences [
8,
36]. Precipitation monitoring in the UAE is managed by the National Center of Meteorology (NCM), which operates an extensive network of automatic rain gauge stations across the country. These stations are strategically distributed to capture precipitation variability across diverse landscapes, including coastal areas, deserts, and mountainous regions [
16]. Rainfall in the UAE is typically low and unevenly distributed both spatially and temporally, with annual averages ranging between 80 mm and 120 mm depending on the location [
45]. Seasonal rainfall trends are heavily influenced by the region’s unique geography, including coastal plains, mountainous terrain, and desert landscapes [
46]. Rainfall is primarily concentrated during the winter and early spring months, associated with mid-latitude cyclones and frontal systems, while the summer months are largely dry. The spatial distribution of total annual average rainfall (
Figure 1c) highlights a clear gradient, with higher precipitation amounts concentrated in the northeast and significantly lower values in the inland desert areas. This variability presents a challenge in hydrological modeling and water resource management, emphasizing the need for accurate and reliable precipitation datasets. Despite its sporadic nature, rainfall is a critical hydrological input for groundwater recharge and ecological processes in the UAE. The availability of high-resolution gauge data allowed for a robust evaluation of satellite rainfall products, providing valuable insights into their ability to capture the unique rainfall patterns of this arid environment.
3. Results
3.1. Statistical Measures Evaluation
Table S1 provides a comprehensive station-level evaluation of the four QPE datasets MSWEP, CMORPH, IMERG, and GSMaP using three key statistical indicators: RMSE, MAE, and KGE. This condensed framework captures both the magnitude of errors and the overall agreement between satellite-based estimates and gauge observations, enabling a more focused comparison of product efficiency across the UAE’s diverse topography.
The analysis reveals that GSMaP consistently delivers the most balanced and reliable performance across the majority of stations, as indicated by lower RMSE and MAE values and higher KGE scores. This reflects GSMaP’s robust calibration procedures and dynamic gauge-adjustment algorithms, which effectively reduce bias and improve temporal consistency. Its superior skill is particularly evident in areas with complex topography, such as Masafi, Jabal Hafeet, and Jabal Jais, where microwave-based retrievals can better resolve orographic rainfall mechanisms. The relatively low RMSE and high KGE at these locations suggest that GSMaP captures both the intensity and variability of rainfall events more accurately than the other products. CMORPH also performs favorably, particularly in mountainous and high-rainfall zones. Its morphing technique, which relies on time interpolation between microwave overpasses, appears to enhance temporal coherence. However, CMORPH occasionally underestimates peak intensities and displays higher MAE at coastal and inland stations such as Al Gheweifat and Al Ain, indicating a potential difficulty in representing light or spatially fragmented rainfall.
IMERG, while a globally advanced product incorporating multiple sensors and gauge adjustments, exhibits mixed performance across the study domain. In many inland and coastal locations, IMERG tends to overestimate light to moderate rainfall, resulting in higher RMSE and slightly lower KGE values. This overestimation likely arises from its reliance on infrared retrievals during microwave data gaps, which can misinterpret warm-cloud signatures common in arid and maritime climates as precipitation. Despite these biases, IMERG maintains reasonable performance consistency, suggesting its potential utility after localized bias correction. In contrast, MSWEP shows the weakest performance across most stations, marked by higher RMSE and MAE and lower KGE values. Its coarse spatial resolution and reliance on globally merged reanalysis and gauge data limit its effectiveness in capturing highly localized convective rainfall, which dominates the UAE’s hydroclimate. MSWEP’s static bias correction method may further reduce responsiveness to short-term rainfall variability, leading to underestimation during peak events and misrepresentation of rainfall totals.
Spatially, the statistical measures illustrate a clear gradient in QPE performance from mountainous to inland areas. Stations with complex terrain and higher rainfall variability show better overall agreement between satellite and gauge data, whereas flat, arid inland sites display larger discrepancies. This pattern emphasizes the combined influence of topography, rainfall intensity, and algorithmic design on QPE accuracy. Critically, while GSMaP and CMORPH outperform other datasets, none of the QPEs exhibit universally high accuracy across all stations. This finding underscores the inherent challenge of applying global precipitation products to a hyper-arid region with complex terrain and sporadic rainfall. It also highlights the need for tailored post-processing methods—such as regional bias correction, merging with high-density gauge networks, or hybrid machine learning frameworks to optimize rainfall estimation for hydrological and climate applications in the UAE.
The regional comparison (
Figure 2) provides an integrated view of the mean normalized performance of all four QPE datasets—MSWEP, CMORPH, IMERG, and GSMaP—across three physiographically distinct zones: Mountainous, Coastal, and Inland regions. The evaluation is based on three key metrics: RMSE, MAE, and KGE, where values were normalized so that higher scores indicate better performance. This figure serves to reveal the regional dependency of satellite rainfall retrieval skill in the UAE and highlights the influence of topography and proximity to the coast on QPE accuracy.
The Mountainous region exhibits the highest overall performance among all QPEs, particularly for GSMaP and CMORPH, which consistently achieve superior normalized scores across all three metrics. This strong performance can be attributed to the frequent occurrence of orographic precipitation and well-defined convective structures that are better captured by microwave sensors integrated into these algorithms. The lower RMSE and higher KGE values indicate that both GSMaP and CMORPH effectively represent rainfall magnitude and temporal variability in topographically complex environments. In contrast, IMERG, despite its advanced calibration and dense sensor network, tends to overestimate rainfall intensity in these areas, likely due to the dominance of convective echoes misinterpreted by its infrared components.
In the Coastal region, the performance of all QPEs diminishes slightly, but GSMaP remains the most reliable dataset. The reduced accuracy in this zone is likely linked to the influence of marine atmospheric conditions—such as sea-breeze convection and shallow maritime clouds—that often challenge satellite-based retrievals. The lower skill of MSWEP and IMERG in coastal settings reflects their difficulties in resolving light or short-lived rainfall events, which are typical near the coast.
The Inland region exhibits the weakest performance across all QPEs, with the highest normalized errors (RMSE and MAE) and relatively low KGE values. This degradation is expected in hyper-arid areas where rainfall events are infrequent, localized, and often below the sensitivity thresholds of satellite retrieval algorithms. The coarse spatiotemporal resolution of global QPE products limits their ability to detect sporadic convective storms, leading to larger discrepancies compared to gauge observations.
The regional analysis clearly demonstrates that GSMaP consistently outperforms other QPEs in both mountainous and coastal zones, while CMORPH provides competitive accuracy in mountainous terrain. These results emphasize the strong topographic control on QPE reliability and highlight the necessity of regional bias correction or hybrid merging strategies to improve precipitation representation in inland desert areas. The insights from this regional assessment provide an essential foundation for developing localized satellite–gauge merging frameworks tailored to the hydroclimatic diversity of the UAE.
3.2. Rainfall Intensity Frequency Comparison
Figure 3 compares gauge observations with four QPE datasets—MSWEP, IMERG, CMORPH, and GSMaP across five rainfall intensity categories (Very Light, Light, Moderate, Heavy, and Extreme) during 2004–2020. The gauge data indicate that Very Light and Light rainfall events account for approximately 78% of total occurrences. In comparison, Moderate rainfall accounts for about 15%, and Heavy to Extreme rainfall accounts for less than 7%.
Among the QPEs, MSWEP most closely matches the gauge distribution, capturing 76% of events in the light rainfall range and 17% in the moderate range, showing its strength in reproducing overall event frequency. IMERG performs comparably but tends to overestimate light rainfall (≈82%) and underrepresent heavier intensities (>25 mm). CMORPH exhibits a bias toward underestimating light rain (≈70%), whereas GSMaP shows a flatter distribution, slightly overpredicting heavy rainfall events (>30 mm).
The comparison highlights MSWEP’s relatively balanced detection capability across all intensity categories, while the other satellite products show systematic biases—either underrepresenting or overrepresenting specific rainfall intensities. These differences have implications for hydrological modeling in arid environments, where accurate representation of low- to moderate-intensity rainfall is essential for water resource and flood risk assessments.
3.3. Categorical and Extreme Indices
The comparison of extreme indices highlights the strengths and limitations of each dataset in terms of detection capability, false alarm rates, and overall agreement with ground-based measurements (
Figure 4). Each subplot (a–d) corresponds to one of the categorical indices, enabling a direct comparison across the products. In subplot (a), the POD values indicate the models’ ability to correctly detect precipitation events. MSWEP and IMERG exhibit consistently high POD values, with IMERG showing slightly higher values overall. GSMaP also performs reasonably well, while CMORPH shows a noticeable spread and lower POD values, suggesting it struggles to detect precipitation consistently across the region. Subplot (b) shows the FAR values, which measure the proportion of false precipitation detections; therefore, lower FAR values indicate better performance. A high FAR in arid regions such as the UAE carries particular implications due to the region’s low rainfall frequency and short-lived convective systems. In such environments, even minor overestimations of rainfall frequency can significantly distort perceptions of wet conditions, leading to false flood warnings or misinformed water management decisions. The elevated FAR observed for IMERG and GSMaP in
Figure 4b suggests a tendency of these products to misclassify non-rainy pixels as rainfall, likely due to challenges in distinguishing light cloud signatures from actual precipitation under warm, dry atmospheric conditions. Subplot 4(c) displays the CSI values, which account for both hits and false alarms to assess overall success. MSWEP demonstrates the highest median CSI, followed by GSMaP and IMERG, indicating that these products achieve a better balance between correct detections and false alarms. CMORPH exhibits lower CSI values and greater variability, reflecting its limited ability to accurately capture precipitation events.
Figure 4c shows that MSWEP achieves a more balanced detection of rainfall events, minimizing missed detections while avoiding excessive false alarms. This balance makes MSWEP comparatively more reliable for hydrological and operational applications where both overprediction and underprediction can have profound implications. Finally, subplot (d) presents the FBI values, which measure the tendency to overestimate or underestimate precipitation events. All four products exhibit FBI values greater than 1, indicating a general tendency to overestimate precipitation. MSWEP shows the least overestimation, with lower variability than the other products. GSMaP and IMERG exhibit higher median FBI values, while CMORPH shows the greatest spread, suggesting significant overestimation in some areas.
MSWEP shows consistent performance across all categorical indices, with balanced detection capabilities (POD (0.7), FAR (0.65), better overall success (CSI (0.35)), and minimal bias (FBI (2)). IMERG also performs well, particularly in POD, but shows a slightly higher FBI, indicating a modest tendency to over-detect rainfall events compared to gauges; however, this difference is within an acceptable range and is not statistically significant. GSMaP demonstrates moderate performance but exhibits higher false alarm ratios and bias. CMORPH, on the other hand, shows the weakest performance, with substantial variability and lower scores across most indices, highlighting its limitations in accurately detecting and quantifying precipitation in the study region.
3.4. Extreme Rainfall Indices Evaluation
Consecutive Dry Days (CDD) (
Figure 5a) quantifies the length of dry spells. Gauge observations exhibit moderate CDD values, representing typical intermittent rainfall behavior over the UAE. Both MSWEP and IMERG tend to overestimate the duration of dry periods, reflecting a conservative bias toward non-rainy days, while CMORPH underestimates CDD with a narrower range. GSMaP demonstrates the highest variability, indicating inconsistent dry spell detection.
Consecutive Wet Days (CWD) (
Figure 5b) represents the persistence of wet periods. MSWEP shows close agreement with gauge data, with only a minor positive bias, suggesting it effectively captures short-duration wet sequences common in the region. CMORPH and GSMaP, however, show a broader spread and higher CWD values, suggesting a tendency to overestimate the continuity of wet events, whereas IMERG maintains a moderate, balanced distribution.
For Rx1 day and Rx5 day indices (
Figure 5c,d), representing the maximum 1-day and 5-day precipitation totals, respectively, MSWEP captures these extreme rainfall magnitudes relatively well but with a slightly wider interquartile range. CMORPH and IMERG underestimate extreme precipitation intensities, while GSMaP exhibits high variability, reflecting inconsistent detection of short-lived convective storms.
The R10 mm and R20 mm indices (
Figure 5e,f), denoting the number of days exceeding 10 mm and 20 mm thresholds, further demonstrate MSWEP’s reliability in reproducing heavy rainfall frequency. Although it slightly overestimates moderate rainfall events, its distribution remains consistent with gauge observations. In contrast, CMORPH shows weaker correspondence, while IMERG and GSMaP display broader dispersions, reflecting difficulties in accurately representing moderate to heavy rainfall occurrences.
Finally, R30 mm, PRCPTOT, and R95P (
Figure 5g,i) capture very heavy rainfall days, total precipitation accumulation, and the 95th percentile of daily rainfall, respectively. MSWEP aligns most closely with gauge observations for PRCPTOT and R95P, confirming its ability to represent total and extreme rainfall amounts. A slight positive bias in R30 mm suggests a tendency to overrepresent the most intense events, which may stem from its multi-source merging strategy and global bias correction approach.
MSWEP exhibits balanced performance across all indices, effectively capturing the spatial and temporal variability of both moderate and extreme rainfall over the UAE. While IMERG performs comparably well in some indices, CMORPH and GSMaP show less consistency, particularly under conditions of localized or convective rainfall typical of arid climates. These results emphasize the potential of MSWEP as a reliable precipitation dataset for hydrological and climate applications, while also reflecting the inherent challenges satellite algorithms face in detecting extreme events in dry regions with high spatial rainfall heterogeneity.
3.5. Qualitative Measures
Figure 6 presents a comparative analysis of precipitation characteristics across gauge observations and four satellite-based precipitation products.
Figure 6A shows the percentage of time without precipitation for each dataset, providing insight into the dry periods across the study region. The gauge observations indicate high dry spell frequencies, with most stations showing non-precipitation percentages exceeding 95%. The satellite products generally capture this dry period pattern, but notable differences emerge. CMORPH appears to underestimate dry conditions relative to gauge data, particularly in the southern regions, suggesting potential false precipitation detection. In contrast, MSWEP, IMERG, and GSMaP align more closely with the gauge estimates, though localized discrepancies are evident in coastal and northern areas.
Figure 6B displays the 95th percentile precipitation (95PP), representing extreme rainfall thresholds across the datasets. The gauge data exhibit substantial spatial variability, with higher values concentrated in the northeastern coastal and mountainous regions. IMERG and GSMaP exhibit the largest deviations from gauge observations, generally overestimating extreme precipitation events, particularly in the northeastern regions. CMORPH, on the other hand, tends to underestimate 95PP across most stations, indicating a potential bias toward lower rainfall intensities. MSWEP demonstrates relatively balanced performance, capturing the spatial variability of the 95th percentile precipitation (95PP), with values ranging approximately from 9 to 20 mm day
−1 in inland regions to over 25 mm day
−1 along the eastern mountains. In contrast, CMORPH and GSMaP show weaker spatial correspondence, with underestimation over the coastal and inland plains where 95PP values drop below 20 mm day
−1, indicating that MSWEP more effectively represents the intensity distribution across varying topographic zones. These findings suggest that while satellite products provide valuable insights into extreme precipitation characteristics, their reliability varies across regions and with precipitation dynamics.
Similarly, the annual total rainfall and its deviations from the long-term normal for the gauge observations and satellite precipitation products over the study period from 2004 to 2020 are presented in
Figure 7. Panel (a) displays the total annual average rainfall recorded by each dataset, with the horizontal black line representing the long-term mean annual rainfall of 100.6 mm. It highlights interannual variability, with some years exhibiting significant above-average rainfall (e.g., 2006, 2009, and 2019), while others fall well below the historical mean (e.g., 2011, 2014, and 2017). Differences among the satellite products are evident, particularly in years with extreme rainfall events, where GSMaP frequently exhibits higher totals than the other products. Panel (b) presents the rainfall departures from the normal annual average, emphasizing wet and dry years. Positive values indicate years with above-average precipitation, while negative values reflect drier-than-normal conditions. The gauge observations and satellite datasets exhibit similar trends in most years, indicating the ability of these products to capture overall rainfall variability. However, discrepancies are notable, particularly in extreme years where some products overestimate or underestimate departures from normal conditions. GSMaP shows more pronounced peaks in wet years, while MSWEP and CMORPH tend to provide more moderate variations.
3.6. Quantitative Measures
The heatmap (
Figure 8) presents an aggregated comparison of the average number of rainfall days classified into three categories: light rainfall (<5 mm), moderate rainfall (5–20 mm), and heavy rainfall (>20 mm) for the gauge data and four satellite products. The color intensity reflects the frequency of rainfall occurrences, with warmer colors indicating a higher number of days and cooler colors representing fewer days. A key observation is the substantial overestimation of light rainfall days by satellite products relative to gauge observations, particularly for MSWEP, which records more than six times as many light rainfall days as the gauge. This overestimation suggests that MSWEP tends to detect low-intensity precipitation more frequently than ground measurements. Similarly, IMERG and GSMaP also show higher counts of light rainfall days than the gauge, while CMORPH approximates more closely but remains overestimated. For moderate rainfall (5–20 mm), IMERG reports nearly three times as many days as the gauge, followed by GSMaP, which also overestimates the frequency. CMORPH, on the other hand, appears to underestimate moderate rainfall events. For heavy rainfall events (>20 mm), all satellite products show lower frequencies than gauge observations, with MSWEP and CMORPH detecting the fewest heavy rainfall days. This underestimation suggests that satellite products struggle to accurately capture extreme precipitation events, which are crucial for hydrological and flood risk assessments. Overall, the heatmap highlights significant discrepancies in the detection of rainfall frequency across different intensity levels, emphasizing the importance of further calibration and validation for improving satellite-based precipitation estimates in the region.
Similarly, the box plots in
Figure 9 illustrate the distribution of rainfall event counts across the precipitation datasets GAUGE, MSWEP, CMORPH, IMERG, and GSMaP. Each box represents the interquartile range (IQR), with the median denoted by the horizontal line within the box, while whiskers extend to the minimum and maximum values within 1.5 times the IQR. Outliers are marked as individual points beyond the whiskers. The GAUGE dataset, which represents ground observations, shows a moderate spread in rainfall events, with the median slightly above 40 events. MSWEP and CMORPH exhibit distributions similar to the gauge data, but with slightly lower medians, indicating that these products capture rainfall event frequencies relatively well but tend to underestimate the extremes. IMERG, on the other hand, displays a significantly higher median and broader spread, suggesting it detects more rainfall events, likely due to overestimation of light precipitation. GSMaP also shows a higher median than GAUGE, but its spread is smaller than IMERG’s.
MSWEP and CMORPH appear to align more closely with the gauge-based event distribution, while IMERG tends to overestimate the number of rainfall events, potentially due to a lower detection threshold. GSMaP also shows a tendency toward higher event counts but with slightly lower variability than IMERG. These variations highlight differences in how each satellite-based dataset detects and records precipitation events in comparison to ground observations.
3.7. Seasonal Analysis
Seasonal analysis is essential for understanding the performance consistency of satellite precipitation products (SPPs) under varying hydroclimatic regimes. In arid and semi-arid regions such as the UAE, rainfall is highly episodic and often concentrated in short winter periods driven by synoptic-scale systems. At the same time, the rest of the year remains mostly dry. Evaluating satellite-based quantitative precipitation estimates (QPEs) at an annual scale may therefore obscure seasonal disparities in detection accuracy and bias patterns. A seasonal breakdown—winter (DJF), spring (MAM), summer (JJA), and autumn (SON)helps reveal how retrieval algorithms respond to distinct precipitation mechanisms (e.g., convective vs. stratiform rain) and atmospheric conditions such as humidity, cloud-top temperature, and wind shear, which directly influence microwave and infrared signal retrievals.
Figure 10 (Seasonal Skill Metrics) illustrates the performance of MSWEP, CMORPH, IMERG, and GSMaP against the spatially averaged gauge observations across the UAE for each season. The results indicate substantial seasonal variability in accuracy, with notably better skill in the cool and transition seasons (DJF and MAM) than in the hot, dry summer (JJA).
Winter (DJF): CMORPH and GSMaP exhibit higher correspondence with gauge data, reflected by R2 values above 0.5 and positive Kling–Gupta Efficiency (KGE) scores, suggesting better representation of stratiform frontal rainfall. IMERG, although showing a relatively high R2, demonstrates pronounced positive bias, consistent with its known tendency to overestimate wetness in low-latitude arid regions where light rainfall and shallow cloud systems challenge retrieval accuracy.
Spring (MAM): CMORPH again performs best, achieving the highest R2 (~0.74) and KGE (~0.61), with minimal bias (−3.3%). This season marks the transition from frontal to convective rainfall, and CMORPH’s morphing technique appears more adept at capturing spatially shifting convective cells than other datasets.
Summer (JJA): All SPPs exhibit degraded performance, with low R2 and negative KGE values, as rainfall events during this period are rare, localized, and often below sensor detection thresholds. The inflated biases are largely due to the low denominator effect in percentage metrics, which amplifies relative errors when small gauge totals are used.
Autumn (SON): GSMaP shows the highest agreement with gauges (R2 ≈ 0.79; KGE ≈ 0.84), demonstrating its robust detection of early convective storms and its adaptive gauge calibration scheme that improves retrieval precision during transitional months.
Overall, CMORPH dominates in the cooler half of the year, while GSMaP outperforms during the warmer, convective-dominated months. MSWEP, despite being gauge-corrected, performs less consistently, potentially due to the sparse ground network used in its bias correction at coarse spatial scales.
Similarly,
Figure 11 compares GSMaP and IMERG scatter relationships with gauge observations for
DJF and
MAM. Both datasets show noticeable scatter around the 1:1 reference line, with IMERG generally producing higher rainfall estimates than GSMaP.
In DJF, GSMaP closely follows the 1:1 line at moderate rainfall magnitudes (0–15 mm day−1), indicating its better calibration under stable stratiform rainfall conditions. IMERG, however, overestimates high-intensity events, with points deviating above the reference line. This suggests that the IMERG retrieval algorithm tends to misclassify thin, cold cloud-tops as rain-bearing systems, resulting in spurious precipitation.
In MAM, both products show wider dispersion, but CMORPH (as inferred from the skill metrics) and GSMaP perform more realistically than IMERG, which continues to overestimate totals beyond 20 mm day−1. The overprediction by IMERG may stem from its precipitation-phase detection scheme, which relies heavily on infrared data in regions with limited passive microwave overpasses. GSMaP, by contrast, applies an adaptive Kalman filter with ground-based adjustments, enhancing its ability to match the magnitude and frequency of observed rainfall during transitional seasons.
Figure 12 highlights how QPE algorithms respond to distinct meteorological systems in the UAE, represented by two contrasting rainfall events: (a) 11 January 2020 (DJF frontal system) and (b) 14 August 2013 (JJA convective storm). The three performance indicators—RMSE, MAE, and KGE—clearly distinguish between these rainfall regimes, illustrating both the strengths and inherent weaknesses of satellite precipitation products in arid, topographically complex regions.
During Event (a), a widespread frontal system brought uniform, stratiform rainfall across much of the country. Under such conditions, the four QPEs showed strong convergence toward gauge observations, as reflected in low RMSE and MAE and relatively high KGE. The performance consistency, particularly of GSMaP and CMORPH, can be attributed to their microwave-driven retrieval schemes and effective gauge calibration, which are well suited to detecting large-scale, thermodynamically stable precipitation systems. IMERG also performed satisfactorily but tended to slightly overestimate low-intensity rainfall, a common outcome of its infrared-based gap-filling algorithms. In contrast, MSWEP’s coarser spatial resolution resulted in moderate underestimation, though its overall pattern remained consistent with the gauge network.
By comparison, Event (b), associated with an intense, short-lived convective burst over the Hajar Mountains, revealed the limitations of all QPEs. RMSE and MAE increased sharply while KGE values dropped, showing widespread misrepresentation of the event’s magnitude and distribution. The convective nature of JJA storms—characterized by small spatial footprints, high rainfall intensity, and rapid evolution—poses a fundamental challenge for satellite algorithms with limited temporal revisit frequency. Microwave overpasses often miss the storm core, while IR-based proxies misclassify non-precipitating clouds as rainfall. Even GSMaP and IMERG, which incorporate near-real-time calibration, struggled to track the localized peaks observed in gauges. CMORPH’s smoothing technique further weakened its responsiveness to such transient events.
3.8. Spatial Variations in QPE Performance and Physiographic Controls
A more analytical interpretation of the spatial patterns in QPE performance reveals that physiography and rainfall-generation mechanisms play a central role in shaping retrieval accuracy across the UAE. The mountainous stations, such as Jabal Jais, Jabal Hafeet, Jabal Mebreh, Masafi, and Al-Tawiyen, consistently show larger discrepancies between gauge observations and satellite-based estimates. These regions experience highly localized convective bursts and orographic uplift, which produce sharp spatial rainfall gradients over short distances. Such dynamics challenge satellite retrieval algorithms, particularly those relying on infrared and passive microwave signals, because warm-cloud precipitation and shallow convective towers often lack strong thermal contrast. As a result, CMORPH and IMERG tend to underestimate peak intensities in these high-elevation zones, while GSMaP typically displays greater variability due to its motion-vector tracking and gauge-adjustment processes that become less stable in steep terrain. In inland lowland regions, rainfall intensity is typically weaker and dominated by warm-cloud processes, which leads IMERG to systematically overestimate precipitation due to its sensitivity to thin cloud layers. MSWEP performs better overall but still inherits biases from its coarse global gauge correction framework, which cannot fully capture the hyper-local spatial variability of rainfall in arid environments. Coastal stations, such as Fujairah Airport, Fujairah Port, Hatta, and Dhudna, exhibit comparatively higher agreement among QPEs because rainfall events along the coast are often produced by broader synoptic-scale systems that enhance microwave detectability and spatial consistency across retrieval algorithms. These patterns confirm that QPE accuracy in the UAE is not solely a function of sensor performance; it is strongly modulated by atmospheric dynamics, microphysical characteristics of rainfall, and terrain complexity.
Section 3 now emphasizes these physiographic controls to explain why certain datasets perform well in one region but poorly in another.
4. Discussion
Accurate estimation of precipitation is crucial for water resource management, flood prediction, and climate change adaptation, particularly in arid environments like the United Arab Emirates (UAE). In this study, we evaluated the performance of the MSWEP precipitation dataset against gauge observations and three commonly used satellite-based precipitation products (CMORPH, IMERG, and GSMaP) using a range of statistical, categorical, and extreme precipitation indices. Our findings provide valuable insights into the ability of these products to capture the spatial and temporal variability of precipitation in a region characterized by erratic and low-intensity rainfall events. The statistical measures, including correlation coefficient (CC), root mean square error (RMSE), mean absolute error (MAE), Kling-Gupta Efficiency (KGE), and Nash-Sutcliffe Efficiency (NSE), provided a comprehensive assessment of the products’ accuracy. The results revealed that MSWEP exhibited a relatively higher correlation with gauge observations compared to CMORPH and GSMaP, but slightly lower than IMERG. However, in terms of RMSE and MAE, MSWEP performed better than IMERG, indicating lower absolute deviations from gauge-based precipitation estimates. KGE values showed that MSWEP had more balanced performance in terms of variability and bias, while NSE values confirmed that MSWEP is a reliable product for representing precipitation variability in the region. These findings align with previous studies that highlighted the ability of high-resolution satellite products such as IMERG to capture precipitation patterns in arid regions but also emphasized their limitations in accurately estimating extreme precipitation events [
8,
16].
The categorical indices further confirmed the strengths and weaknesses of each product. MSWEP demonstrated a higher probability of detection (POD) compared to CMORPH and GSMaP but was slightly outperformed by IMERG, which has been previously recognized for its improved detection capabilities [
29]. However, MSWEP exhibited a higher false alarm ratio (FAR), suggesting that it sometimes overestimates rainfall event occurrence, which could be attributed to its data-merging approach. The critical success index (CSI) showed that MSWEP provided a better balance between detection and false alarms, while the frequency bias index (FBI) indicated a tendency of IMERG and GSMaP to overestimate precipitation events, a pattern observed in previous studies evaluating satellite products over arid region [
26]. Extreme precipitation indices, such as consecutive dry days (CDD), consecutive wet days (CWD), Rx1 day, Rx5 day, R10 mm, R20 mm, R30 mm, PRCPTOT, and R95P, further highlighted the discrepancies between the satellite products and gauge observations. MSWEP performed reasonably well in capturing dry and wet periods, with results similar to those found for CMORPH and GSMaP, though slight overestimation of wet periods was observed. The 95th percentile precipitation (95PP) analysis indicated that all satellite products struggled to capture extreme rainfall accurately, with GSMaP and IMERG showing the highest overestimations. These findings are consistent with Wehbe et al. [
15], who reported that satellite precipitation products tend to overestimate rainfall in regions with complex terrain and low rainfall frequency.
While gauge observations serve as the benchmark for validating satellite precipitation products, they are not entirely error-free, particularly in arid regions like the UAE, where sparse networks and localized rainfall can limit spatial representativeness. Measurement uncertainties arising from wind undercatch, evaporation losses, or sensor calibration may partly contribute to discrepancies between gauge and satellite estimates. Despite applying rigorous quality control and data verification, these inherent limitations should be acknowledged as a source of uncertainty when interpreting QPE performance.
This seasonal evaluation yields several essential insights into how satellite precipitation products behave under varying hydroclimatic conditions. The performance of these datasets is influenced by the type of rainfall and the prevailing atmospheric processes that dominate each season. The analysis clearly shows that satellite retrieval accuracy is not constant throughout the year; instead, it fluctuates in response to changes in cloud structure, rainfall intensity, and background temperature. CMORPH and GSMaP tend to perform better because their algorithms are better suited to stratiform and mixed precipitation systems that dominate the cooler months, whereas IMERG often struggles with light rainfall or warm-cloud events typical of the UAE’s arid climate.
From a hydrological modeling perspective, these seasonal variations are crucial. Using uncorrected IMERG data, for instance, could lead to overestimated runoff or unrealistic hydrological responses during the winter and spring seasons due to its tendency to overpredict rainfall intensity. In contrast, GSMaP and CMORPH provide more balanced rainfall estimates that better match the observed gauge data, making them more suitable for rainfall–runoff modeling and flash flood prediction in data-scarce environments. The relatively moderate performance of MSWEP highlights another limitation—the reliance on coarse global gauge datasets for bias correction, which may not capture the localized rainfall patterns of the UAE’s highly variable terrain. Integrating regional rain gauge networks into such global products would likely enhance their representativeness and reliability. This aligns with [
8] who emphasized the importance of using multiple hydro-climatological measures to assess satellite precipitation accuracy. The seasonal bias trends further indicated that MSWEP generally performed well across all seasons, with biases more pronounced in winter and spring, similar to the patterns observed in previous assessments of satellite-based products in the UAE [
41].
In broader evaluations of arid and semi-arid regions, the performance of the MSWEP dataset has shown comparable behavior to our findings in the UAE. For example, ref. [
49] reported that MSWEP v2.2 achieved one of the highest Kling-Gupta Efficiency (KGE) values across West Africa, outperforming most satellite-only estimates, though its performance varied seasonally depending on gauge density. Similarly, ref. [
50] found that in the semi-arid tropics of Tamil Nadu, India, MSWEP provided consistent climatological accuracy but tended to underestimate high-intensity convective rainfall, mirroring our observations of underrepresentation of extremes. Furthermore, ref. [
51] demonstrated that over the complex terrain of the Hengduan Mountains in China, MSWEP effectively captured the spatial distribution of mean precipitation but struggled to represent extreme events accurately, consistent with the overestimation tendencies of other satellite datasets such as IMERG and GSMaP observed in our study. These regional comparisons reinforce that while MSWEP’s multi-source merging framework ensures relatively stable performance for light to moderate rainfall across diverse dry and topographically variable environments, it remains sensitive to extreme rainfall detection. It would benefit from region-specific bias correction and gauge integration to improve hydrological applicability in arid zones. In summary, while MSWEP exhibited commendable performance in capturing precipitation characteristics over the UAE, certain biases remain, particularly regarding the frequency and magnitude of extreme rainfall events. Compared to CMORPH and GSMaP, MSWEP demonstrated a better correlation with gauge observations, although IMERG outperformed it in some statistical and categorical measures. The findings of this study reaffirm the need for regional calibration of satellite precipitation products and highlight the importance of integrating multiple datasets to enhance precipitation estimates in arid regions. Future research should focus on bias correction techniques and multi-source precipitation merging to improve the reliability of precipitation estimates for hydrological applications in the UAE.
The performance of MSWEP over the UAE also aligns with findings from recent evaluations conducted in other arid and semi-arid regions, highlighting both its strengths and limitations in data-sparse environments. Ref. [
52] demonstrated that MSWEP V2 performs competitively in hydrological simulations across China’s diverse climate zones, ranking second only to GPCC for extreme precipitation indices. Importantly, their study showed that all precipitation datasets, including MSWEP, tend to underestimate annual maximum 1-day and 5-day precipitation while overestimating R95p in dry northwestern China, a pattern consistent with the underestimation and spread we observed for Rx1 day, Rx5 day, and R95P in the UAE. Similarly, ref. [
53] reported that MSWEP v2.8 achieved strong performance in Pakistan’s low-elevation arid regions, but its accuracy deteriorated sharply in high-elevation mountainous areas where all global precipitation products exhibited large overestimations. This behavior mirrors our findings that MSWEP and other QPEs struggle over the UAE’s mountainous terrain due to highly localized convective rainfall and sparse gauge coverage. In the humid-to-semi-arid Huaihe River Basin, ref. [
54] showed that MSWEP V2.1/V2.2 provides excellent temporal accuracy but weaker representation of spatial rainfall patterns and extreme daily events, particularly ≥100 mm/day, which again reflects the tendency of MSWEP in our study to capture seasonal variability well but underestimate spatial heterogeneity and peak extremes. Collectively, these studies confirm that while MSWEP is a reliable multi-source product for representing precipitation climatology and seasonal dynamics in arid settings, its performance for short-duration extremes and in heterogeneous terrain remains constrained, reinforcing the need for regional calibration, supplemental gauge networks, and multi-product integration for risk-sensitive hydrological applications in the UAE.
5. Conclusions
The accurate estimation of precipitation is vital for effective water resource management, climate studies, and hydrological modeling, particularly in arid environments like the UAE. This study comprehensively evaluated the performance of MSWEP against gauge observations and three widely used satellite-based precipitation products (CMORPH, IMERG, and GSMaP) using a suite of statistical, categorical, and extreme precipitation indices. The findings indicate that MSWEP performance over the UAE should be interpreted in a conditional manner. While it shows higher RMSE and MAE and lower KGE than GSMaP and CMORPH at most stations, indicating weaker skill in reproducing event-scale rainfall magnitude, it exhibits comparatively balanced rainfall occurrence characteristics, including high POD (~0.7), moderate FAR (~0.65), the highest median CSI (~0.35), and lower variability in FBI. MSWEP also reproduces rainfall intensity–frequency distributions, PRCPTOT, and R95P reasonably well, supporting its suitability for seasonal and climatological analyses. However, systematic underestimation of Rx1 day, Rx5 day, and heavy rainfall days (>20 mm) limits its applicability for short-duration extreme events and hazard-focused applications. Clear distinction between skill in seasonal accumulation, rainfall detection, and extreme magnitude is therefore essential when interpreting MSWEP reliability in hyper-arid environments.
Despite its overall effectiveness, MSWEP, like other satellite-based products, faces limitations in capturing extreme precipitation events, as seen in the analysis of the 95th percentile precipitation and extreme precipitation indices. The underestimation of heavy rainfall and overestimation of light rainfall remain challenges that must be addressed. Additionally, while MSWEP provides a reasonable representation of dry and wet spells, its performance varies across different locations, emphasizing the importance of localized validation efforts. The need for enhanced bias correction techniques and integration with ground-based observations remains a crucial step in improving satellite precipitation estimates in arid regions.
Looking forward, future research should focus on refining bias correction algorithms for MSWEP and other satellite-based products to improve their reliability in arid environments. The integration of high-resolution regional models, reanalysis datasets, and machine learning techniques could further enhance precipitation estimation accuracy. Additionally, multi-source data fusion approaches, combining gauge, radar, and satellite data, may provide more comprehensive insights into rainfall dynamics. Given the increasing importance of remote sensing for climate monitoring and water resource planning, continued validation and improvement of satellite precipitation products will be essential for addressing the challenges of precipitation estimation in data-scarce regions like the UAE.
This study contributes to the growing body of research assessing the reliability of satellite precipitation products in arid climates, underscoring the strengths and limitations of MSWEP in comparison to established datasets. The findings provide valuable insights for hydrologists, meteorologists, and policymakers working toward improved precipitation monitoring and water management strategies in the UAE and similar regions. In operational contexts, the evaluated products—particularly MSWEP can play a vital role in supporting water resource planning, flood forecasting, and drought monitoring in the UAE. The high temporal resolution and spatial completeness of MSWEP make it particularly useful for driving hydrological and climate models in data-scarce regions with limited gauge coverage. When bias-corrected and regionally calibrated, MSWEP can serve as a reliable input for early warning systems and long-term water management strategies, enhancing preparedness and resilience to hydroclimatic extremes in arid environments. Future advancements in satellite retrieval algorithms and the incorporation of advanced statistical corrections using machine learning will be key to further enhancing the applicability of satellite precipitation data for hydrological and climatological studies. Moreover, integration of the MSWEP satellite data with the local radar dataset may improve the accuracy of the in capturing the extreme events in the country.