Evaluation of the New CHIRPS-v3 Dataset for Regional Rainfall Estimation: A Case Study in Southern Italy

Clemente, Emanuele; Roseto, Rodolfo; Capolongo, Domenico

doi:10.3390/rs18132090

Open AccessArticle

Evaluation of the New CHIRPS-v3 Dataset for Regional Rainfall Estimation: A Case Study in Southern Italy

by

Emanuele Clemente

^1,*

,

Rodolfo Roseto

²

and

Domenico Capolongo

³

¹

Department of Economy and Finance, University of Bari Aldo Moro, 70124 Bari, Italy

²

UOS Bari, Institute for Electromagnetic Sensing of the Environment (IREA), National Research Council of Italy (CNR), 70126 Bari, Italy

³

Department of Earth and Geoenvironmental Sciences, University of Bari Aldo Moro, 70125 Bari, Italy

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(13), 2090; https://doi.org/10.3390/rs18132090 (registering DOI)

Submission received: 4 May 2026 / Revised: 10 June 2026 / Accepted: 19 June 2026 / Published: 26 June 2026

(This article belongs to the Special Issue Satellite Remote Sensing of Weather, Water and Climate Couplings and Phenomena (Second Edition))

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

CHIRPS-v3 improves systematically over CHIRPS-v2 in several rainfall-validation diagnostics over Apulia.
ERA5 shows the strongest overall agreement for continuous metrics, but ERA5 and IMERG over-detect light rainfall and shorten dry spell estimates.

What are the implications of the main findings?

Dense regional gauge networks reveal sub-regional errors that national-scale validations can miss, especially in topographically complex areas.
Dataset choice should be application-specific: ERA5 is preferable for broad daily rainfall agreement, whereas CHIRPS-v3 is promising for drought and agro-climatic applications.

Abstract

Reliable rainfall information is fundamental for climate-risk analysis and operational monitoring in Mediterranean regions such as Apulia (Southern Italy), one of the areas most affected by climate change-driven shifts in rainfall patterns. Recent evaluations across Italy and comparable Mediterranean settings consistently show that gridded precipitation performance is highly dependent on orography and dataset typology: reanalyses often provide the best overall agreement with gauges, while satellite and blended products can exhibit larger biases, with persistent challenges in complex terrain and for high-intensity events. In this context—and given the documented spatial heterogeneity of rainfall extremes within Apulia—validation of such gridded datasets with respect to ground observations remains essential for early warning and climatological applications. In the present work, we evaluate four widely used precipitation products—CHIRPS-v2, the newly released CHIRPS-v3, IMERG, and ERA5—benchmarking them against the Apulia region Civil Protection rain-gauge network. We provide diagnostics aligned with early warning and climate monitoring: bias and error statistics, rainfall intensity distributions, and dry spell duration. A key contribution is, to our knowledge, the first dedicated validation of CHIRPS-v3 in Apulia, which is timely given that CHIRPS-v3 was explicitly developed to address shortcomings such as underestimated temporal variance and to leverage expanded station inputs. The results indicate that CHIRPS-v3 yields systematic improvements over CHIRPS-v2 across multiple metrics, while ERA5 generally shows the strongest overall agreement with gauges—consistent with broader Italian evidence.

Keywords:

weather; precipitation; remote sensing; Apulia; CHIRPS-v3; gridded precipitation; rainfall validation; Mediterranean climate; Italy

1. Introduction

Accurate rainfall information is crucial for a wide range of climate services and environmental applications, from drought early warning and water resource planning to hydrological monitoring and agro-climatic characterization. In Mediterranean regions, this need is particularly acute: rainfall is strongly seasonal, highly intermittent, and spatially heterogeneous, with localized convective events occurring alongside multi-week dry spells [1,2]. This could be further exacerbated by climate change-induced variability and intensity shifts [3]. In this context, precipitation is among the most challenging variables to monitor reliably, and it implies that even modest differences in measurement, or in how gridded products represent wet day frequency, intensity distributions, and dry spell persistence, can translate into different inferences about anomalies, drought onset and climatic baselines. This global challenge has been extensively documented in various climatic settings, from the complex terrains of South America [4] to the diverse rainfall regimes of East Asia [5], emphasizing the need for local-scale validation.

In practice, many operational workflows and research studies rely on gridded precipitation products because they provide spatial completeness, consistent temporal coverage, and immediate usability for mapping and modeling [6,7]. Yet these products are not interchangeable, and the validation of these products against reliable ground data is a fundamental prerequisite for their use.

For drought monitoring and early warning, differences in “event behavior” are as important as differences in aggregate error. Categorical performance, especially the ability to discriminate wet from dry days under explicit thresholds, directly affects deficit persistence metrics, dry-spell characterization, and derived indicators, such as the fraction of rainy days. Recent studies across the Mediterranean basin have highlighted that, while gridded products capture monthly or seasonal cycles, their ability to reproduce daily sequence and extreme event frequency remains highly variable [8,9].

A comprehensive study on Southern Italy by Cammalleri et al. [10] indicated that the reliability of gridded precipitation products is not uniform across the territory; while these datasets generally capture inter-annual variability and major rainfall trends, their performance remains highly region-specific, showing better agreement with ground observations in Apulia and Sicily than in other Mediterranean sectors. Comparable findings in other application settings emphasize that small biases—especially around light rainfall—can materially affect drought-relevant diagnostics and water resource indicators [11]. At the regional scale in Southern Italy, specific validations have confirmed these challenges: in Sicily, recent assessments found that gridded products effectively captured general precipitation patterns but struggled to reproduce high-intensity events [12], while in Campania, the work of Shazil et al. [13] underscores systematic overestimations in high-elevation areas, where orographic effects cause higher precipitation levels.

A recent comprehensive evaluation over the entire Italian territory by Moccia et al. [14] highlighted that product skill varies systematically with climate regime and orography. While that study provides a vital national benchmark, it relies on the SCIA (National System for the Collection, Elaboration and Diffusion of Climatological Data) database. As noted by the authors, the SCIA network exhibits significant limitations in specific areas: stations are particularly sparse and characterized by short or fragmented records in Southern Italy, and this is certainly true of the Apulia region. Hence, while national-scale assessments offer general insights, they may not fully capture the performance of gridded products in regions like Apulia, where the gauge density in national databases is insufficient to resolve sub-regional heterogeneity. Consequently, there is a critical need for high-resolution validation using dense, locally managed networks to overcome the limitations of sparse national datasets. This necessity is further corroborated by recent global assessments [15], which demonstrate the importance of dense gauge network availability that allows this type of comparison and enhances the reliability of climate change impact assessments, as well as the accuracy of regional water resource estimates.

Within this context, the present study provides a systematic performance evaluation of four widely used precipitation datasets over Apulia—CHIRPS-v2, the newly released CHIRPS-v3, IMERG, and ERA5—using the Apulia regional Civil Protection (henceforth CP) rain-gauge network as the ground benchmark. The analysis is structured around two complementary aims. First, we quantify the goodness of gridded estimates relative to in situ observations using a set of continuous and categorical evaluation metrics that capture both overall accuracy and event detection behavior. Second, we translate product differences into rainfall-derived indicators used in operational and climatological practice—such as total rainfall, rainfall variability (e.g., standard deviation), the number of rainy days, and dry spell frequency and duration—to provide decision-relevant insights beyond generic performance rankings.

A central contribution is that, to our knowledge, this is the first dedicated validation of the just-released CHIRPS-v3 against station observations in the Apulia region, implemented in a framework directly comparable to the long-standing CHIRPS-v2 benchmark and to widely used alternatives (IMERG and ERA5). This is timely because CHIRPS-v3 was explicitly developed to address known limitations of previous generations (including underestimated temporal variability) while incorporating methodological updates and expanded station inputs [16]. By combining standard performance metrics with operationally interpretable rainfall-derived indicators across Apulia’s internal heterogeneity, the study aims at supporting more defensible dataset selection for drought early warning and climatological applications in Southern Italy.

This paper is organized as follows. In Section 2, we describe the study area, the Apulia regional CP reference observations, the gridded products (CHIRPS-v2, CHIRPS-v3, IMERG, and ERA5), and the validation methodology. In Section 3, we report the validation results, highlighting where CHIRPS-v3 improves relative to CHIRPS-v2 and how all products compare with gauge observations. In Section 4, we discuss implications for agro-climatological applications, as well as limitations and recommendations. In Section 5, we present the conclusions.

2. Materials and Methods

2.1. Study Area

The study area includes the whole Apulia region and the upper basin of the Ofanto River, which falls into the adjacent regions of Campania and Basilicata. The Apulia region stretches in a northwest–southeast direction for roughly 350 km, encompassing a variety of geomorphological settings (Figure 1). In the northern sector, three main geomorphological units can be identified: the Subappennino Dauno mountains, the Tavoliere plain, and the Gargano promontory. The Subappennino Dauno, where the highest peak of the region is located (Mount Cornacchia, 1152 m), borders Apulia along the boundary with Campania. The Tavoliere plain lies between the Subappennino Dauno and the Gargano, a prominent mountainous block projecting into the Adriatic Sea. The central part of the region is characterized by the Murge highland, a wide karst plateau that gently descends toward the Adriatic coast and extends southeastward to the Taranto–Brindisi alignment. The southern portion, corresponding to the Salento peninsula, separates the Adriatic and Ionian seas. This area is predominantly flat, with the exception of a series of low ridges known as the Serre.

One distinctive feature of the area is the absence of a well-developed river network. The principal river is the Ofanto River, which is approximately 170 km long, springing in the Campania region and crossing the Apulia region for around 50 km before flowing into the Adriatic Sea [17]. The Apulia region is characterized by a typical Mediterranean climate, marked by hot, arid summers and mild winters. Average annual temperatures generally fall between 15 and 16 °C, while rainfall reaches its maximum during the autumn months. Over the 1930–2020 period, mean yearly precipitation amounted to about 685 mm, with notable spatial variability: the Tavoliere plain recorded the lowest totals (around 450 mm), whereas the Subappennino Dauno and Gargano sectors received up to 1100 mm [18].

2.2. Reference Observations: Apulia Regional Civil Protection Rain Gauges

Ground observations were downloaded from the Apulia CP regional center platform (https://reteidrometeo.protezionecivile.puglia.it/polarisopen/gis/map?guest=pubblico, last accessed 20 March 2026). A total of 143 pluviometric gauge stations were selected for the consistency of their time series for the period 1 January 2014 to 31 December 2020 (Figure 1). This specific seven-year timeframe was selected to ensure the simultaneous and continuous availability of high-quality records across the entire high-density network, minimizing gaps that could affect the comparative statistical metrics.

The daily time series underwent a quality control check. Specifically, we checked the completeness of the time series for each station; if a particular day in a station was missing the rainfall amount, that day was excluded from the analysis. The total number of days excluded from the analysis for such a reason is 4023, which is 1.09% of the total readings. Furthermore, we checked—within each station and across the entire time series—for outliers in the daily rainfall values. The procedure by which outliers are detected is described in [19] and involves two steps. First, the distribution of the station-specific daily rainfall values is transformed to approach a standard normal distribution. To do so, rainfall values are normalized, then standardized (a z-score of the normalized daily rainfall values is computed). Second, a threshold is applied to the transformed daily rainfall values to set the bounds of an outlier detection region (that is, the tails of the transformed distribution); the threshold is set at a conventional value of 3. As a result, 193 values (0.05% of the total readings) were detected and excluded from the analysis.

2.3. Gridded Precipitation Products

We evaluate four widely used gridded precipitation datasets that are commonly adopted in drought/climate monitoring and environmental applications, but which differ substantially in retrieval/assimilation approach, spatial resolution, and temporal structure (Table 1):

CHIRPS-v2 is a blended satellite–gauge product that combines thermal infrared imagery, a high-resolution climatology, and in situ observations to generate quasi-global precipitation estimates at 0.05° resolution, distributed at a daily scale (via temporal disaggregation/aggregation from its native processing) [20].
CHIRPS-v3 is the newly released generation of CHIRPS. It explicitly targets a key shortcoming identified in CHIRPS-v2—underestimation of temporal precipitation variance—and introduces methodological updates (including an updated climatology), expanded spatial coverage, and the inclusion of thousands of additional time-varying stations [16].
IMERG (GPM) is a multi-sensor satellite retrieval that merges passive microwave estimates with infrared/radar information to provide detailed precipitation fields at 0.1° and 30 min resolution (commonly aggregated to daily totals for applications and evaluation) [21]. Although IMERG provides a native 30 min resolution, which is highly advantageous for monitoring sub-daily convective extremes and flash floods, the data were aggregated to daily totals in this study to ensure a consistent comparison with the other datasets. It should be noted, however, that such temporal aggregation might mask the intrinsic strengths of IMERG in capturing high-frequency precipitation events compared to CHIRPS or ERA5.
ERA5 is a global reanalysis product provided at 0.25° and hourly resolution, widely used as a reference-grade gridded dataset in hydrometeorological applications and intercomparisons [22]. The extraction of daily data was performed via a point-to-pixel approach using Google Earth Engine. This methodology involves inherent uncertainties related to the spatial mismatch error; specifically, comparing point-based rain-gauge measurements with area-averaged grid cells—ranging from approximately 5 km for CHIRPS to 31 km for ERA5—may introduce a representativeness bias. Such a discrepancy intrinsically penalizes coarser resolution datasets like ERA5, particularly in the detection of localized convective phenomena.

At each of these sites where an in situ station is located, daily rainfall data is extracted from the abovementioned gridded datasets, via Google Earth Engine using a point-to-pixel approach and bilinear mean value extraction. This allowed for the construction of an integrated dataset where daily measurements from each in situ station are paired with remote sensing (RS) data collected at the same location and timestamp.

The validation of the different remote sensing (RS) rainfall datasets and their comparison with benchmark references—the CP automatic weather stations—is conducted at the in situ station level, using data from the 143 in situ station sites. At each of these sites, RS data are also extracted using the geographic coordinates of the in situ sites. As a first step, we therefore constructed an integrated dataset by pairing measurements from each in situ station with corresponding RS data collected at the same location. To ensure comparability across the data sources, a data baselining process was performed. This included harmonizing the temporal resolution to a common interval (e.g., daily totals). The sub-daily (hourly) IMERG dataset was hence aggregated into 24 h windows. Dataset validation was implemented using the statistical software STATA 17 and R 5.2.

2.4. Performance Metrics for Gridded Precipitation-Product Evaluation

Following established approaches in hydrometeorological validation [11,23], we first computed a suite of continuous and categorical performance metrics to assess the agreement between Apulia in situ rain-gauge observations and the gridded rainfall products under evaluation (CHIRPS-v2, CHIRPS-v3, IMERG, and ERA5). These metrics, reported in Table 2, were derived at each CP weather station site and were intended to capture both agreement in rainfall magnitude and skill in rainfall occurrence detection. The continuous metrics include root mean square error (RMSE), Pearson’s correlation coefficient (R), and the coefficient of variation ratio (CV). The categorical metrics were derived from the contingency-table outcomes reported in Table 3, based on whether the evaluated product and the in situ benchmark jointly identified rainfall occurrence or non-occurrence at the daily scale. Specifically, we distinguish (A) hits, where both the gridded product and the gauge record rainfall; (B) false alarms, where rainfall is detected by the gridded product but not by the gauge; (C) misses, where rainfall is observed by the gauge but not by the gridded product; and (D) correct negatives, where both sources indicate no rainfall. These four outcomes were then used to derive the standard categorical metrics of probability of detection (POD), false alarm ratio (FAR), frequency bias index (FBI), and critical success index (CSI), thereby providing a complementary assessment of rainfall event detection alongside the continuous evaluation of rainfall amounts.

Furthermore, to provide lower-scale information, the analysis of performance metrics was conducted both for the full sample of stations and in stratified form across elevation classes (flatland, hill, and mountain) and physiographic areas of Apulia (Murgia highland, Taranto Ionian area, Gargano highland, Salento peninsula, Subappennino Dauno, Ofanto basin, and Tavoliere plain). This allows us to detect potential variations in the product’s performance across different topographical and climatic conditions. Differences in mean metric values between sources were examined through Wilcoxon tests, while kernel density plots were used to compare the distributional shape of selected metrics across products. In addition, inverse distance weighting (IDW) interpolation was applied to the selected metrics (IDW with power parameter p = 2, computed on a 5 km grid in the projected UTM 33N reference system), in order to map their spatial variability and highlight localized zones of stronger or weaker agreement with gauge observations.

2.5. Methodology for Rainfall-Derived Climatological Indicators

Beyond standard agreement metrics, the evaluation was extended to a set of rainfall-derived indicators intended to capture the aspects of precipitation most relevant for climatological interpretation and operational use. In particular, for each in situ station and for each matched product series (CHIRPS-v2, CHIRPS-v3, IMERG, and ERA5), we computed total annual rainfall, standard deviation, mean daily rainfall, median daily rainfall, longest dry spell duration, and number of dry days. To quantify dry spells, a daily rainfall threshold of less than 1 mm/day was selected to define a ‘dry day’. This definition follows the international guidelines established by the Expert Team on Climate Change Detection and Indices (ETCCDI) for the calculation of Consecutive Dry Days (CDD) [24]. In Mediterranean ecosystems, utilizing a 1 mm threshold is standard practice because lower trace amounts are typically lost immediately to atmospheric evaporative demand and canopy interception, failing to effectively recharge soil moisture or contribute to hydrological runoff [25,26]. This second layer of analysis was designed to complement the performance-metric framework by assessing whether products that perform similarly in terms of correlation or error still differ in their representation of rainfall regime, intermittency, and persistence, all of which are central to water resource management. Differences between CP’s in situ observations and each gridded product were first examined through two-sample Wilcoxon tests on station-level indicators. To support the interpretation of the distributional behavior of each source, kernel density plots were also produced for selected variables. In addition, inverse distance weighting (IDW) interpolation was used to map the spatial variability of these indicators and to visualize regional patterns that may not emerge from tabular summaries alone. Given the topographic heterogeneity of Apulia, the assessment was carried out both for the full sample and in stratified form, distinguishing major altitude classes (flatland, hill, and mountain) as well as the aforementioned physiographic areas.

3. Results

This section quantitatively shows the performance metrics as outlined in the methodological section. We first examine continuous performance metrics; then we present the results of categorical performance metrics; finally, we present the performance of rainfall-derived indicators that are used to assess gridded products’ differences result in different climatological diagnostics.

3.1. Continuous Performance Metrics

Figure 2 summarizes the continuous performance metrics of CHIRPS-v2, CHIRPS-v3, ERA5, and IMERG across all stations using the coefficient of variation (CV), Pearson’s correlation coefficient (R), and root mean square error (RMSE). ERA5 shows the strongest agreement with the in situ stations’ benchmark, showing both the highest correlation and the lowest RMSE; in turn, IMERG performs similarly well, though with slightly larger errors. CHIRPS-v3 occupies an intermediate position and improves consistently over CHIRPS-v2 in correlation and RMSE. CHIRPS-v2 remains the weakest of the four products in overall error terms. The three metrics also show that product evaluation cannot be reduced to a single statistic. CV ratios for the two CHIRPS versions remain closer to the benchmark value of 1 than those of ERA5 and IMERG, suggesting a more comparable aggregate representation of relative variability. However, this does not imply better daily agreement or lower absolute error. The products that more closely reproduce relative variability are not necessarily those that best reproduce rainfall magnitude or temporal correspondence.

The spatial pattern of each product’s performance is illustrated in Figure 3, which illustrates the RMSE maps through IDW interpolation over the whole study area; the products’ performance is not geographically uniform across Apulia. CHIRPS-v2 shows the broadest and most persistent high-error areas, whereas CHIRPS-v3 attenuates part of this signal and produces a more spatially coherent pattern of agreement with gauges. ERA5 and IMERG show more extensive low-error areas, although localized pockets of weaker performance remain visible, especially in the northern sector of the region. The areas where rainfall gridded products perform the worst are those in mountainous areas where convective events are more frequent, namely, the Gargano and the Subappennino Dauno areas. In Bari, CHIRPS-v2 shows particularly high levels of RMSE, which are no longer present with CHIRPS-v3; this suggests a better integration of Bari weather station data into the CHIRPS algorithm. The Salento peninsula is also an area where the gridded datasets’ concordance with in situ stations is rather low; interestingly, however, in this area, CHIRPS-v3 seems to have lower RMSE than ERA5 and IMERG.

3.2. Categorical Performance Metrics

Figure 4 reports the categorical skill scores—probability of detection (POD), false alarm ratio (FAR), frequency bias index (FBI), and critical success index (CSI)—across rainfall intensity classes. These metrics complement the continuous analysis by evaluating how well each product identifies rainfall occurrence at different precipitation intensities. This distinction is especially important in a Mediterranean setting such as Apulia, where the alternation between wet and dry days directly affects drought-relevant diagnostics, including rainy day frequency and dry spell length.

All products perform best in the lightest rainfall class. Detection skill is relatively high for very small daily totals, but it deteriorates sharply once rainfall enters the 1–5 mm/day class, and it remains limited for heavier events. POD and CSI both decline with increasing rainfall intensity, while FAR increases substantially outside the lightest class. Practically, this means that the products are more reliable at identifying light rainfall than at detecting moderate or intense daily accumulations. Furthermore, CHIRPS-v3 generally improves on CHIRPS-v2, particularly in terms of event detection, but ERA5 and IMERG often perform at least as well, and in some classes slightly better, than the satellite-blended alternatives. At the highest rainfall intensities, however, all products show weak detection skill and clear evidence of underdetection, as indicated by FBI values below 1. Figure A4 further shows that this decline in detection skill with increasing rainfall intensity (particularly for events above 20 mm/day) is consistent across flatland, hill, and mountain areas. While some differences by altitude emerge, the dominant pattern is common to all topographic settings: rainfall products detect light events relatively well, but their POD decreases substantially for moderate and especially heavy daily rainfall.

False alarms (FAR) remain high across much of the intensity spectrum, implying that even products with acceptable continuous performance may still misclassify rainfall occurrence on a substantial number of days. In this respect, the categorical results slightly qualify the ranking suggested by the continuous metrics. ERA5 remains the strongest product overall, and CHIRPS-v3 improves visibly over CHIRPS-v2, but no dataset performs especially well for heavier rainfall classes.

3.3. Rainfall-Derived Climatological Indicators

To assess whether the aforementioned differences determine different rainfall indicators, as shown in Figure 5, we refine the comparison by placing the in situ benchmark alongside the median paired difference for each remote sensing product, so that both the reference level and the direction of deviation are immediately visible. This part of the analysis moves beyond generic validation metrics and focuses on variables directly relevant for operational climatology, including rainfall totals, mean and median daily rainfall, variability (captured via average station-level standard deviation of daily precipitation), rainy day frequency, and dry spell persistence.

For total annual rainfall, the in situ benchmark stands at 672 mm/year, a value consistent with the long-term climatological totals of 685 mm/year reported in the literature for this region [18]. All products except ERA5 diverge significantly from this benchmark. CHIRPS-v2 shows the largest underestimation (−83.44 mm, p < 0.01), followed by CHIRPS-v3 (−47.60 mm, p < 0.01). ERA5 is the closest to the benchmark, displaying a non-significant overestimation of 19.68 mm, suggesting broad consistency with the gauged total at the regional scale. IMERG is the only product to significantly overestimate total annual rainfall (+80.49 mm, p < 0.01).

For average daily rainfall, the gauge benchmark indicates a value of 1.88 mm. Relative to this, CHIRPS-v2 and CHIRPS-v3 both show significant underestimation, with median paired differences of −0.23 mm (p < 0.01) and −0.23 mm (p < 0.01), respectively, whereas ERA5 is statistically indistinguishable from the in situ benchmark on this metric (0.06 mm). IMERG, by contrast, shows a pronounced and statistically significant overestimation of average daily rainfall (0.22 mm, p < 0.01).

For median daily rainfall, the gauge benchmark is 0.01 mm. Both CHIRPS products match this median value (0.00 mm), although their paired differences are statistically significant (p < 0.01), while ERA5 and IMERG display larger, statistically significant positive deviations of 0.14 mm and 0.19 mm (p < 0.01), respectively. This indicates that the CHIRPS products reproduce the median closely, whereas ERA5 and IMERG show appreciable positive deviations even at the median.

When considering the representation of rainfall variability, we look at the median value of the standard deviation in daily rainfall, captured at an aggregated temporal and spatial level; in this regard, we find a stronger and more systematic discrepancy between in situ ground data and gridded products. The in situ benchmark reports a median daily rainfall standard deviation of 5.78, while all four products fall below this value, with median paired differences of −1.06 (p < 0.01) for CHIRPS-v2, −1.12 (p < 0.01) for CHIRPS-v3, −1.43 (p < 0.01) for ERA5, and −0.98 (p < 0.01) for IMERG. IMERG is the closest to the benchmark, showing the smallest underestimation (−0.98), although, like the other products, this deviation is statistically significant; ERA5 shows the largest underestimation. All products therefore share a systematic underestimation of daily variability. A completely distinct pattern is also evident in daily rainfall non-parametric skewness. The CP’s stations have an average non-parametric skewness equal to 0.33. All products slightly overestimate skewness: CHIRPS-v2 is the closest to the benchmark (+0.02, p < 0.01), followed by CHIRPS-v3 (+0.04, p < 0.01), while ERA5 and IMERG overestimate it more, by +0.08 (p < 0.01) and +0.07 (p < 0.01), respectively. Although these skewness differences are not large in absolute terms, they indicate that all products (especially ERA5 and IMERG) move the distribution toward a moderately heavier right tail, with the CHIRPS products remaining closest to the observed shape.

When considering the temporal sequence of wet and dry conditions, the contrast between in situ stations and gridded products becomes stronger. The CP’s benchmark weather stations record an average of 75.17 rainy days per year; for this indicator, CHIRPS-v3 is again the closest product, overestimating by only 3.43 days (p < 0.01). CHIRPS-v2 instead underestimates rainy day frequency by 10.14 days (p < 0.01), while ERA5 and IMERG substantially overestimate it, by 30.29 and 34.86 days (p < 0.01), respectively. Similar differences can also be seen, conversely, when considering the duration of the yearly longest dry spell, for which the gauge benchmark is 32.47 days/year. CHIRPS-v2 shows a small, non-significant overestimation of this duration of 1.71 days (ns), while CHIRPS-v3 shows an even smaller, non-significant overestimation of 0.57 days (ns), making version 3 the closest and most accurate product on this specific metric. ERA5 and IMERG diverge more strongly, significantly underestimating the longest dry spell by 4.71 and 6.43 days (p < 0.01), respectively.

To establish whether these annual biases are seasonally uniform or concentrated in particular periods, Figure A3 disaggregates the three most operationally relevant indicators—total rainfall, rainy days, and longest dry spell—by meteorological season (DJF, MAM, JJA, and SON). Due to the regional rainfall concentration in autumn and winter, the seasonal decomposition shows that the CHIRPS dry bias in total rainfall is most evident in the wet seasons (SON–DJF) (that is, where the majority of the annual total accumulates), while the ERA5 and IMERG over-detection of rainy days is most pronounced in the JJA season, indicating spurious light rain days rather than a uniform year-round offset. The longest dry spell signal is essentially a summer (JJA) phenomenon, since the annual maximum dry spell in this Mediterranean setting almost invariably falls within the dry season; Figure A3 confirms that the annual shortening of dry spells by ERA5 and IMERG, and the close agreement of CHIRPS-v3, are driven by JJA.

Furthermore, Figure 6 makes these contrasts spatially visible through the daily average rainfall maps. The in situ station-based map shows a clear pattern, with wetter conditions in the Gargano, Subappennino Dauno, and Salento peninsula areas and drier conditions in inland areas. CHIRPS-v2 reproduces the overall gradient, but much less precisely, especially in areas with higher elevation, such as Gargano and Subappennino Dauno. In turn, CHIRPS-v3 preserves more of the observed spatial structure and appears visually closer to the weather station values, although it remains drier in the northern parts of our study area. ERA5 and IMERG capture the broad rainfall distribution across the study area, but shift the overall precipitation pattern toward wetter conditions, consistent with the positive annual differences reported in Figure 5 (though for ERA5, this annual volume overestimation remains statistically non-significant).

Overall, the rainfall indicator analysis shows that the ranking of products depends on the diagnostic of interest. ERA5 reproduces the broad spatial pattern convincingly, but it tends to generate wetter annual conditions and too many rainy days. CHIRPS-v3 improves on CHIRPS-v2 in several respects and is particularly close to the gauges in the partition between wet and dry days; it is also the closest product to the benchmark for the longest dry spell duration, which it reproduces with only a small, non-significant overestimation. CHIRPS-v2 performs less well on annual totals and mean values and shows a slightly larger (though still non-significant) overestimation of the longest dry spell. This suggests that the increased sensitivity of the CHIRPS-v3 algorithm in capturing rainfall may introduce light rainfall false alarms that effectively split consecutive dry days, thereby artificially reducing the estimated length of the longest dry spells compared to its predecessor. IMERG behaves similarly to ERA5, with a tendency toward wetter annual conditions and more frequent rain day occurrence. This highlights the importance of considering the topographic nature of the area of interest, since disagreement between in situ weather stations and gridded products tends to increase with altitude and orographic complexity. For instance, Figure 7 clearly shows that the distribution of mean daily rainfall differs not only across products but also across altitude classes. In flatland areas, all gridded datasets underestimate the benchmark distribution, although CHIRPS-v3 is the closest to the in situ distribution, while the CHIRPS-v2 curve is centered furthest toward lower rainfall amounts; in turn, the ERA5 and IMERG distributions are more shifted towards overestimation. In hill areas, the separation becomes more pronounced: CHIRPS-v2 and CHIRPS-v3 remain concentrated at lower values than the gauge benchmark, whereas ERA5 and especially IMERG are shifted toward wetter conditions, indicating an even stronger tendency to over-represent average daily rainfall at intermediate elevations. The contrast is at its highest levels in mountainous areas, where the in situ distribution is centered at distinctly higher values and only partly approached by ERA5 and IMERG; both CHIRPS products remain clearly left-shifted, suggesting persistent underestimation in higher-altitude environments. Overall, these patterns indicate a broad overall tendency for CHIRPS-v2 to be systematically too dry, for CHIRPS-v3 to reduce but not eliminate that bias, and for ERA5 and IMERG to align more closely with the central mass of the distribution, albeit with a tendency toward wetter values.

4. Discussion

The results of this study confirm that, while global gridded precipitation products offer a valuable alternative to ground-based observations, their performance in a complex Mediterranean context like Apulia is highly variable. Using the high-density Apulia regional CP in situ stations network, our analysis reveals sub-regional nuances not allowed by the national-scale validations of previous studies.

A primary objective was the evaluation of the newly released CHIRPS-v3. Our findings demonstrate that CHIRPS-v3 represents a substantial improvement over its predecessor, CHIRPS-v2. Specifically, the tendency of v2 to underestimate temporal variability and the intensity tail of rainfall events appears to be partially mitigated in v3. For drought monitoring, this is crucial: v3 provides a more realistic representation of dry spell duration and wet day frequency, which are often distorted in earlier versions of satellite-blended products. This result is consistent with other CHIRPS v3 validation studies carried out in Latin America [27].

The performance of ERA5 as the most consistent product in terms of continuous metrics (RMSE and correlation) aligns with findings from other Mediterranean studies [10,12,28]. However, the overestimation of light rain days remains a persistent challenge for ERA5 and IMERG. This effect has been similarly observed in recent high-resolution gridded assessments across the Mediterranean basin, where the systematic overestimation of wet day frequency can degrade the reliability of climatic baselines [29]. In contrast, CHIRPS products, by design, tend to be more conservative in rainfall detection, which can be advantageous for identifying clear-cut dry periods but may lead to missing localized, high-intensity convective events. The stratification by altitude and physiographic area (e.g., Tavoliere vs. Gargano) confirms that complex topography remains the weakness of gridded datasets. The significant errors observed in the Subappennino Dauno and Gargano highlands are closely linked to the known limitations of global models in resolving the physical processes of orographic lifting, as documented in the recent literature for topographically diverse sectors of Southern Italy. Products generally perform better in the flatlands of Salento and Tavoliere, while error metrics increase significantly in the mountainous areas of Monti Dauni and Gargano, where orographic lifting triggers localized rainfall that global models struggle to resolve at current resolutions. This spatial performance gradient, characterized by a marked accuracy loss in mountainous terrain compared to coastal plains, is consistent with recent findings in other topographically diverse Mediterranean settings [30].

From an operational perspective, the choice of dataset depends on the specific application. For agro-climatic monitoring, where the persistence of dry spells is more critical than the exact intensity of a single event, CHIRPS-v3 emerges as a strong candidate. For flash-flood early warning, however, the systematic biases in peak intensity found in all products suggest that they should be used as complements to, rather than substitutes for, the local rain-gauge network. Indeed, the tendency of satellite-based and reanalysis products to exhibit negative biases during heavy rainfall events—as recently highlighted in coastal Mediterranean regions [31]—confirms the structural difficulty of capturing extreme convective events.

5. Conclusions

This study provides a systematic performance metric assessment of four major gridded precipitation products across the Apulia region. The main conclusions can be summarized as follows:

National-scale assessments may underestimate product skill or miss regional biases due to sparse station coverage, while the use of a dense regional network (143 gauges) proved essential for a robust validation in Apulia.
The performance of CHIRPS-v3 shows clear improvements over CHIRPS-v2 in capturing rainfall variability and intensity distributions, making it a superior tool for long-term climatological studies and drought monitoring in the region.
ERA5 remains the most reliable product for overall mean statistics and temporal correlation, although it tends to overestimate the frequency of light rainfall events.
Product performance is not uniform across Apulia. Performance degrades in mountainous and high-altitude areas, highlighting the need for caution when using gridded data for hydrological modeling in orographically complex areas.

In conclusion, CHIRPS-v3 represents a clear improvement over CHIRPS-v2, particularly in the representation of wet/dry day occurrence and several drought-relevant indicators. ERA5 provides the strongest overall agreement in terms of continuous metrics, but its tendency to overestimate rainy day frequency can shorten estimated dry spells and reduce its reliability for drought-sequence applications. Finally, all products perform less reliably in topographically complex areas, especially in the Gargano and Subappennino Dauno sectors, confirming that orographic rainfall remains a major challenge for gridded datasets.

These findings suggest that dataset choice should be application-specific. ERA5 is suitable when the objective is broad agreement with daily rainfall amounts and temporal variability. CHIRPS-v3 is promising for drought monitoring and agro-climatic applications requiring a realistic distinction between wet and dry days. IMERG and ERA5 may be useful for broader rainfall monitoring, but their tendency to generate excessive light rainfall should be considered when deriving dry spell metrics. For heavy rainfall detection and high-elevation areas, none of the products should be treated as a substitute for local gauge observations.

Overall, this study highlights that bridging the gap between global models and local ground truth requires addressing the structural underestimation of peak rainfall intensities and orographic effects. Future research should therefore focus on refining the representation of these processes within gridded datasets, leveraging physically consistent models that better capture the spatiotemporal heterogeneity of Mediterranean precipitation.

Author Contributions

Conceptualization, E.C., R.R., and D.C.; methodology, E.C., R.R., and D.C.; software, E.C.; validation, E.C. and R.R.; formal analysis, E.C.; investigation, E.C., R.R., and D.C.; resources, R.R. and D.C.; data curation, E.C. and R.R.; writing—original draft preparation, E.C.; writing—review and editing, R.R. and D.C.; visualization, E.C.; supervision, D.C.; project administration, E.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received financial support from the RETURN Extended Partnership and received funding from the European Union Next-GenerationEU (National Recovery and Resilience Plan—NRPP, Mission 4, Component 2, Investment 1.3—D.D. 1243 2/8/2022, PE000005).

Data Availability Statement

The Apulia Civil Protection rain-gauge observations are available from the Apulia Civil Protection regional platform. The gridded precipitation products analyzed in this study are available from their respective data providers and were extracted using Google Earth Engine. Processed data and code supporting the findings are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CHIRPS	Climate Hazards Group InfraRed Precipitation with Station data
CSI	Critical success index
CV	Coefficient of variation
ERA5	Fifth-generation ECMWF atmospheric reanalysis
FAR	False alarm ratio
FBI	Frequency bias index
GEE	Google Earth Engine
GPM	Global Precipitation Measurement
IDW	Inverse distance weighting
IMERG	Integrated Multi-satellitE Retrievals for GPM
POD	Probability of detection
RMSE	Root mean square error

Appendix A

Figure A1. Spatial distribution of Pearson’s correlation coefficient between daily rainfall recorded by in situ stations and the corresponding gridded products, interpolated using inverse distance weighting, for (a) CHIRPS-v2, (b) CHIRPS-v3, (c) ERA5, and (d) IMERG. The maps indicate that temporal agreement is spatially uneven, with stronger correspondence in lower-relief areas and weaker performance in topographically complex zones.

Figure A2. Kernel density distributions of the annual number of rainy days by altitude class, comparing in situ observations with CHIRPS-v2, CHIRPS-v3, ERA5, and IMERG. Differences in rainy day frequency become more pronounced at higher elevations, confirming that wet day representation is sensitive to altitude and can materially affect drought-related indicators.

Figure A3. Median paired differences between each gridded rainfall product and the in situ benchmark for seven rainfall-derived climatological indicators, for each season of the year. The left column reports the cross-station median of the in situ benchmark. The four heatmap columns report the median of paired differences computed as product minus in situ. Color intensity is normalized within each row to the largest absolute difference in that row, so red indicates underestimation and blue indicates overestimation relative to the in situ benchmark. Statistical significance refers to the Wilcoxon signed-rank test for paired differences: *** p < 0.01; ** p < 0.05; ns = not significant.

Figure A4. Probability of detection (POD) by altitude class for CHIRPS-v2, CHIRPS-v3, ERA5, and IMERG across rainfall intensity classes (<1, [1, 5), [5, 20), [20, 40), and ≥40 mm/day). Across all altitude classes, detection skill declines as rainfall intensity increases, while the deterioration is generally stronger in hill and mountain areas, confirming that both intensity and topography condition rainfall event detection.

References

Longobardi, A.; Villani, P. Trend analysis of annual and seasonal rainfall time series in the Mediterranean area. Int. J. Climatol. 2010, 30, 1538–1546. [Google Scholar] [CrossRef]
Planton, S.; Lionello, P.; Artale, V.; Aznar, R.; Carrillo, A.; Colin, J.; Congedi, L.; Dubois, C.; Elizalde, A.; Gualdi, S.; et al. The climate of the Mediterranean region in future climate projections. In The Climate of the Mediterranean Region; Elsevier: Amsterdam, The Netherlands, 2012. [Google Scholar] [CrossRef]
Giorgi, F.; Lionello, P. Climate change projections for the Mediterranean region. Glob. Planet. Change 2008, 63, 90–104. [Google Scholar] [CrossRef]
Zambrano, F.; Wardlow, B.; Tadesse, T.; Lillo-Saavedra, M.; Lagos, O. Evaluating satellite-derived long-term historical precipitation datasets for drought monitoring in Chile. Atmos. Res. 2017, 186, 26–42. [Google Scholar]
Bai, L.; Shi, C.; Li, L.; Yang, Y.; Wu, J. Accuracy of CHIRPS satellite-rainfall products over mainland China. Remote Sens. 2018, 10, 362. [Google Scholar] [CrossRef]
Levizzani, V.; Cattani, E. Satellite remote sensing of precipitation and the terrestrial water cycle in a changing climate. Remote Sens. 2019, 11, 2301. [Google Scholar] [CrossRef]
Sun, Q.; Miao, C.; Duan, Q.; Ashouri, H.; Sorooshian, S.; Hsu, K.L. A review of global precipitation data sets: Data sources, estimation, and intercomparisons. Rev. Geophys. 2018, 56, 79–107. [Google Scholar] [CrossRef]
Gadouali, F.; Benabdelouahab, T.; Boudhar, A.; Hadria, R.; Semane, N.; Fadil, A.; Elrhaz, K. Bias correction of regional climate model simulation for hydrological climate change over Bouregrag watershed in Morocco. Int. J. Hydrol. Sci. Technol. 2024, 18, 125–139. [Google Scholar] [CrossRef]
Sharifi, E.; Steinacker, R.; Saghafian, B. Assessment of GPM-IMERG and other precipitation products against gauge data under different topographic and climatic conditions in Iran: Preliminary results. Remote Sens. 2016, 8, 135. [Google Scholar] [CrossRef]
Cammalleri, C.; Sarwar, A.N.; Avino, A.; Nikravesh, G.; Bonaccorso, B.; Mendicino, G.; Senatore, A.; Manfreda, S. Testing trends in gridded rainfall datasets at relevant hydrological scales: A comparative study with regional ground observations in Southern Italy. J. Hydrol. Reg. Stud. 2024, 55, 101950. [Google Scholar] [CrossRef]
Wodebo, D.Y.; Melesse, A.M.; Woldesenbet, T.A.; Mekonnen, K.; Amdihun, A.; Korecha, D.; Tedla, H.Z.; Corzo, G.; Teshome, A. Comprehensive performance evaluation of satellite-based and reanalysis rainfall estimate products in Ethiopia: For drought, flood, and water resources applications. J. Hydrol. Reg. Stud. 2025, 57, 102150. [Google Scholar] [CrossRef]
Yıldız, M.B.; Di Nunno, F.; de Marinis, G.; Granata, F. Evaluating global precipitation datasets over Sicily: From daily estimates to extreme events. J. Hydrol. Reg. Stud. 2026, 63, 103062. [Google Scholar]
Shazil, M.S.; Aleem, M.; Ahmad, S.; Abdullah, A.; Greco, R. Assessing the Accuracy of Gridded Precipitation Products in the Campania Region, Italy. Water 2025, 17, 2585. [Google Scholar] [CrossRef]
Moccia, B.; Buonora, L.; Bertini, C.; Ridolfi, E.; Russo, F.; Napolitano, F. What is our pick? Assessment of satellite and reanalysis precipitation datasets over Italy. J. Hydrol. Reg. Stud. 2025, 60, 102487. [Google Scholar] [CrossRef]
Su, J.; Miao, C.; Zwiers, F.; Beck, H.; Jones, P.; Sun, Q.; Slater, L.J.; Berghuijs, W.R.; Wada, Y.; Rosenfeld, D.; et al. Precipitation observing network gaps limit climate change impact assessment. Nature 2026, 652, 119–125. [Google Scholar] [CrossRef] [PubMed]
Funk, C.; Peterson, P.; Harrison, L.; Saldivar, R.; Landsfeld, M.; Pedreros, D.; Shukla, S.; Fink, A.H.; Davenport, F.; Peterson, S.; et al. The Climate Hazards Center Infrared Precipitation with Stations, Version 3. Sci. Data 2026, 13, 718. [Google Scholar] [CrossRef] [PubMed]
De Santis, V.; Caldara, M.; Marsico, A.; Capolongo, D.; Pennetta, L. Evolution of the Ofanto River delta from the ‘Little Ice Age’ to modern times: Implications of large-scale synoptic patterns. Holocene 2018, 28, 1948–1967. [Google Scholar] [CrossRef]
Roseto, R.; Dellino, P.; Capolongo, D. Spatial distribution and trend analysis of extreme rainfall time series in Apulia region (Italy). Phys. Geogr. Quat. Dyn. 2024, 46, 163–177. [Google Scholar]
Belotti, F.; Mancini, G.; Vecchi, G. Outlier Detection for Welfare Analysis; Policy Research Working Paper 10231; World Bank: Washington, DC, USA, 2022. [Google Scholar]
Funk, C.; Peterson, P.; Landsfeld, M.; Pedreros, D.; Verdin, J.; Shukla, S.; Husak, G.; Rowland, J.; Harrison, L.; Hoell, A.; et al. The climate hazards infrared precipitation with stations—A new environmental record for monitoring extremes. Sci. Data 2015, 2, 150066. [Google Scholar] [CrossRef] [PubMed]
Huffman, G.J.; Stocker, E.F.; Bolvin, D.T.; Nelkin, E.J.; Tan, J. GPM IMERG Final Precipitation L3 Half Hourly 0.1° × 0.1° V06; Goddard Earth Sciences Data and Information Services Center (GES DISC): Greenbelt, MD, USA, 2019. [CrossRef]
Hersbach, H.; Bell, B.; Berrisford, P.; Hirahara, S.; Horányi, A.; Muñoz-Sabater, J.; Nicolas, J.; Peubey, C.; Radu, R.; Schepers, D.; et al. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 2020, 146, 1999–2049. [Google Scholar] [CrossRef]
Ahmed, J.S.; Buizza, R.; Dell’Acqua, M.; Demissie, T.; Pè, M.E. Evaluation of ERA5 and CHIRPS rainfall estimates against observations across Ethiopia. Meteorol. Atmos. Phys. 2024, 136, 17. [Google Scholar] [CrossRef]
Zhang, X.; Alexander, L.; Hegerl, G.C.; Jones, P.; Tank, A.K.; Peterson, T.C.; Trewin, B.; Zwiers, F.W. Indices for monitoring changes in extremes based on daily temperature and precipitation data. WIREs Clim. Change 2011, 2, 851–870. [Google Scholar] [CrossRef]
Polade, S.D.; Pierce, D.W.; Cayan, D.R.; Gershunov, A.; Dettinger, M.D. The key role of dry days in changing regional climate and precipitation regimes. Sci. Rep. 2014, 4, 4364. [Google Scholar] [CrossRef] [PubMed]
Rivoire, P.; Tramblay, Y.; Neppel, L.; Hertig, E.; Vicente-Serrano, S.M. Impact of the dry-day definition on Mediterranean extreme dry-spell analysis. Nat. Hazards Earth Syst. Sci. 2019, 19, 1629–1638. [Google Scholar] [CrossRef]
Valencia, S.; Marín, D.E.; Gómez, D.; Echavarría-Porras, V.; Mejía-Sepúlveda, J.; Husic, A.; Sullivan, S.; Hoyos, N.; Villegas, J.C.; Harrison, L.; et al. Improvements and limitations of the new Climate Hazards Center Infrared Precipitation with Stations (CHIRPSv3) dataset: Insights from multiple spatio-temporal scales in Colombia. Atmos. Res. 2026, 338, 108971. [Google Scholar] [CrossRef]
Abu Arra, A.; Birpınar, M.E.; Şişman, E. Evaluating ERA5-LAND and IMERG-NASA Products for Drought Analysis: Implications for Sustainable Water Resource Management. Sustainability 2025, 17, 7529. [Google Scholar]
Varotsos, K.V.; Katavoutas, G.; Kitsara, G.; Karali, A.; Lemesios, I.; Patlakas, P.; Giannakopoulos, C. CLIMADAT-GRid: A high-resolution daily gridded precipitation and temperature dataset for Greece. Earth Syst. Sci. Data 2025, 17, 4455–4477. [Google Scholar]
Oukaddour, K.; Fakir, Y.; Le Page, M. Assessment of five global gridded precipitation estimates over a southern Mediterranean basin (Tensift, Morocco). Geomat. Nat. Hazards Risk 2025, 16, 2468850. [Google Scholar] [CrossRef]
Peinó, E.; Petracca, M.; Polls, F.; Udina, M.; Bech, J. Intercomparison of H SAF and IMERG heavy rainfall retrievals over a Mediterranean coastal region. Atmos. Res. 2025, 327, 108311. [Google Scholar] [CrossRef]

Figure 1. Spatial distribution of the Apulia Civil Protection rain-gauge stations and physiographic areas of the study region. The dense regional network provides the spatial coverage needed to evaluate product performance across Apulia’s internal heterogeneity, including lowland, coastal, and orographically complex sectors.

Figure 2. Continuous performance metrics for CHIRPS-v2, CHIRPS-v3, ERA5, and IMERG against the in situ benchmark. Each bar shows the average value and 95% confidence interval for each product and metric. The comparison shows that ERA5 achieves the strongest overall agreement in terms of correlation and RMSE, while CHIRPS-v3 improves consistently over CHIRPS-v2; however, the CV ratio metric inverts the ranking.

Figure 3. Spatial distribution of root mean squared error (RMSE) for (a) CHIRPS-v2, (b) CHIRPS-v3, (c) ERA5, and (d) IMERG, interpolated across the study area using inverse distance weighting. The maps show that error patterns are not spatially uniform: CHIRPS-v3 reduces several high-error areas visible in CHIRPS-v2, while all products retain localized weaknesses in more complex northern sectors, particularly around the Gargano and Subappennino Dauno areas.

Figure 4. Categorical performance metrics for the gridded products across rainfall intensity classes: (a) probability of detection (POD), (b) false alarm ratio (FAR), (c) frequency bias index (FBI), and (d) critical success index (CSI). The figure shows that rainfall detection skill declines as daily rainfall intensity increases, indicating that all products are more reliable for light rainfall occurrence than for moderate and heavy rainfall events.

Figure 5. Median paired differences between each gridded rainfall product and the in situ benchmark for seven rainfall-derived climatological indicators. The left column reports the cross-station mean of the in situ benchmark. The four heatmap columns report the median of paired differences computed as product minus in situ. Color intensity is normalized within each row to the largest absolute difference in that row, so red indicates underestimation and blue indicates overestimation relative to the in situ benchmark. Statistical significance refers to the Wilcoxon signed-rank test for paired differences: *** p < 0.01; ns = not significant.

Figure 6. Spatial distribution of mean daily rainfall (mm/day) across the study area as derived from (a) in situ observations, (b) CHIRPS-v2, (c) CHIRPS-v3, (d) ERA5, and (e) IMERG. The comparison shows that all products reproduce the broad spatial rainfall gradient but differ in magnitude: CHIRPS-v2 remains systematically drier, CHIRPS-v3 better preserves the observed spatial pattern, and ERA5 and IMERG tend to shift the values toward wetter conditions.

Figure 7. Kernel density distributions of mean daily rainfall derived from in situ observations and the gridded precipitation products, shown separately for flatland, hill, mountain, and pooled stations. The distributions indicate that product disagreement increases with altitude: CHIRPS-v2 and CHIRPS-v3 tend to underestimate rainfall in higher-elevation areas, while ERA5 and IMERG more closely approach the central distribution but often shift toward wetter values.

Table 1. Comparative summary of the gridded precipitation products evaluated against the Apulia Civil Protection rain-gauge network. The table highlights the main technical differences among satellite-blended, multi-sensor satellite, and reanalysis products, emphasizing that differences in spatial resolution, temporal resolution, and retrieval approach may affect rainfall event detection and dry spell diagnostics.

Product	Type	Spatial Resolution	Temporal Resolution	Source	Main Feature
CHIRPS-v2	Blended (satellite + gauges)	0.05° (~5 km)	Daily	Climate Hazards Center (UCSB)	Long-standing drought/climate-service benchmark; combines TIR imagery + climatology + stations.
CHIRPS-v3	Blended (satellite + expanded gauges)	0.05° (~5 km)	Daily	Climate Hazards Center (UCSB)	Newly released; designed to reduce underestimation of temporal variance; updated climatology and expanded station inputs.
IMERG (GPM)	Satellite multi-sensor retrieval	0.1° (~10 km)	30 min (aggregated to daily)	NASA/JAXA GPM	High-frequency satellite retrieval; often advantageous for event-scale monitoring; evaluated here at daily scale vs. gauges.
ERA5	Reanalysis	0.25° (~31 km)	Hourly (aggregated to daily)	ECMWF/Copernicus	Physically consistent reanalysis; typically, strong overall agreement but coarser spatial detail than satellite/blended products.

Table 2. Continuous and categorical performance metrics used for product validation. Continuous metrics assess agreement in rainfall magnitude, variability, and temporal correspondence, whereas categorical metrics evaluate the ability of each product to reproduce rainfall occurrence. Perfect-match values indicate the benchmark against which each product is interpreted.

Metric Class	Statistical Metric	Formula	Perfect Match
Continuous	Root Mean Square Error (RMSE)	$RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(P_{i} - O_{i})}^{2}}$	0
	Correlation Coefficient (R)	$R = \frac{\sum_{i = 1}^{n} (O_{i} - \bar{O}) (P_{i} - \bar{P})}{\sqrt{\sum_{i = 1}^{n} {(O_{i} - \bar{O})}^{2} \cdot \sum_{i = 1}^{n} {(P_{i} - \bar{P})}^{2}}}$	1
	Coefficient of Variation ratio (CV)	$CV = \frac{\frac{σ_{P}}{\bar{P}}}{\frac{σ_{O}}{\bar{O}}}$	1
Categorical	Probability of Detection (POD)	$POD = \frac{A}{A + C}$	1
	False Alarm Ratio (FAR)	$FAR = \frac{B}{A + B}$	0
	Frequency Bias Index (FBI)	$FBI = \frac{A + B}{A + C}$	1
	Critical Success Index (CSI)	$CSI = \frac{A}{A + B + C}$	1

Notes: In the formulas, O and P denote the in situ (gauge) and gridded-product daily rainfall, indexed by day i = 1, …, n (with n the number of paired days); an overbar denotes the time mean and σ the standard deviation of each series. A, B, and C are the hits, false alarms, and misses defined in the contingency table (Table 3).

Table 3. Contingency-table structure used to derive categorical rainfall-detection metrics. Hits, false alarms, misses, and correct negatives distinguish whether each gridded product correctly reproduces wet and dry days observed by the in situ benchmark, providing the basis for POD, FAR, FBI, and CSI. In this table, “Yes” indicates that the rainfall event/threshold condition was detected, while “No” indicates that it was not detected. Therefore, a hit occurs when both the gridded product and the gauge record the event; a false alarm occurs when the product records the event, but the gauge does not; a miss occurs when the gauge records the event, but the product does not; and a correct negative occurs when neither records the event.

Evaluated Product	Benchmark (In Situ)
Evaluated Product	Yes	No	Total
Yes	Hit (A)	False Alarm (B)	A + B
No	Miss (C)	Correct Negative (D)	C + D
Total	A + C	B + D	A + B + C + D

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Clemente, E.; Roseto, R.; Capolongo, D. Evaluation of the New CHIRPS-v3 Dataset for Regional Rainfall Estimation: A Case Study in Southern Italy. Remote Sens. 2026, 18, 2090. https://doi.org/10.3390/rs18132090

AMA Style

Clemente E, Roseto R, Capolongo D. Evaluation of the New CHIRPS-v3 Dataset for Regional Rainfall Estimation: A Case Study in Southern Italy. Remote Sensing. 2026; 18(13):2090. https://doi.org/10.3390/rs18132090

Chicago/Turabian Style

Clemente, Emanuele, Rodolfo Roseto, and Domenico Capolongo. 2026. "Evaluation of the New CHIRPS-v3 Dataset for Regional Rainfall Estimation: A Case Study in Southern Italy" Remote Sensing 18, no. 13: 2090. https://doi.org/10.3390/rs18132090

APA Style

Clemente, E., Roseto, R., & Capolongo, D. (2026). Evaluation of the New CHIRPS-v3 Dataset for Regional Rainfall Estimation: A Case Study in Southern Italy. Remote Sensing, 18(13), 2090. https://doi.org/10.3390/rs18132090

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Evaluation of the New CHIRPS-v3 Dataset for Regional Rainfall Estimation: A Case Study in Southern Italy

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Reference Observations: Apulia Regional Civil Protection Rain Gauges

2.3. Gridded Precipitation Products

2.4. Performance Metrics for Gridded Precipitation-Product Evaluation

2.5. Methodology for Rainfall-Derived Climatological Indicators

3. Results

3.1. Continuous Performance Metrics

3.2. Categorical Performance Metrics

3.3. Rainfall-Derived Climatological Indicators

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI