In this section, corresponding daily precipitation values from GPPs (target precipitation) and synoptic stations (reference precipitation) were compared. These precipitation values were compared for the whole study period on a daily basis, followed by the comparisons in each month and year from 2000 to 2020. Both continuous and categorical metrics were employed in this section.
5.1.1. Daily Time Scale
All daily corresponding precipitation values (i.e., nearly 1.7 million observations) were compared to achieve a general overview of the performance of the selected GPPs (see
Figure 3). This comparison considers no spatial or temporal characteristics, and includes all precipitation values, and thus, only provides merely a broad depiction of GPPs’ performance over the last two decades. Large scatterings indicate relatively imperfect agreements between target and reference precipitations.
Figure 3 shows that the precipitation values were mainly low, especially less than 20 mm/day, as anticipated based on the daily precipitation values in Iran. Visually, it can be seen that the TMPA precipitations (
Figure 3a) had the least variation (i.e., standard deviation), while the IMERG precipitation values (
Figure 3e) had the highest variation. Generally,
Figure 3 shows that the target and reference precipitation values were closer to the identical reference line for ERA5 and GSMaP GPPs. In particular, ERA5 (
Figure 3f) obtained the highest CC of 0.623, closely followed by GSMaP with a CC of 0.599, and IMERG (CC = 0.483), PERSIANN (CC = 0.356), CHIRPS (CC = 0.353), and TMPA (CC = 0.330) were in the subsequent ranks. Moreover,
Figure 3a–d illustrate that TMPA and GSMaP had an underestimation when retrieving the precipitation values. These also are in agreement with the calculated MBE values, which were −0.574 mm and −0.132 mm for TMPA and GSMaP. On the other hand, IMERG, ERA5, CHIRPS, and PERSIANN overestimated the precipitation values with the MBE values of 1.001 mm, 0.268 mm, 0.152 mm, and 0.094 mm, respectively.
Table 2 summarizes the quantitative values of both continuous and categorical metrics that support the perceptions drawn from
Figure 3. The GSMaP and TMPA achieved the first- and second-lowest MAE values of 0.834 mm and 0.913 mm, while other GPPs with overestimation characteristics obtained higher MAE values, with the highest value of 1.791 mm for IMERG. Considering the RMSE metric, the GSMaP obtained the lowest value (3.48 mm), and the second most accurate GPP was ERA5, with an RMSE value of 3.59 mm. IMERG and CHIRPS, on the other hand, were less accurate by acquiring the RMSE values of 6.89 mm and 4.68 mm, respectively.
Furthermore, the POD values indicated that ERA5 (POD = 0.867) had the best performance, followed by GSMaP (POD = 0.784), although the FAR metric suggested the exact opposite. In particular, ERA5 and GSMaP obtained FAR values of 0.480 and 0.568, respectively, indicating that GSMaP performed the best among the other GPPs. The obtained POD values illustrated the weakest performance of TMPA and CHIRPS; while considering the FAR metrics, their performances were second to GSMaP and closely similar to ERA5. With respect to CSI, GSMaP was identified as the most accurate GPP in precipitation occurrence identification, with a CSI value of 0.454, followed by ERA5, IMERG, PERSIANN, CHIRPS, and TMPA. The findings support that GSMaP outperformed other GPPs, and both TMPA and CHIRPS were identified as the least suitable GPPs in the last two decades.
The daily comparisons were also adopted at each synoptic station individually. The boxplots of six metrics, including CC, RMSE, MBE, POD, FAR, and CSI, are presented in
Figure 4.
Figure 4a shows that GSMaP and ERA5 outperformed the other GPPs by generally obtaining higher CC values at each station, while the TMPA presented the weakest performance. Moreover,
Figure 4b clearly illustrates that the IMERG achieved higher RMSE values at each station, supporting the previous finding reported in
Table 2. Other GPPs manifested relatively similar performances when considering RMSE at each station, with the best performance for GSMaP, followed by ERA5, TMPA, PERSIANN, and CHIRPS. Likewise,
Figure 4c presents MBE, indicating the existence of overestimation and underestimation possibilities in all GPPs, except for TMPA, in which almost all stations had underestimations. Meanwhile, more considerable variation and magnitudes in MBE at all stations are evident for IMERG, which agrees with the obtained results when considering all corresponding precipitation values (MBE = 1.01 mm).
Figure 4d shows that PERSIANN, GSMaP, IMERG, and ERA5 had relatively similar behaviors when considering POD values at the station level with the superiority of ERA5 and GSMaP, respectively, while TMPA and CHIRPS had poor performances. On the other hand, the FAR boxplot (
Figure 4e) shows rather close performance patterns for all GPPs, with the best and worst performances for GSMaP and PERSIANN. Finally, the CSI metric demonstrated that the GSMaP, followed by ERA5, IMERG, and PERSIANN, had the best performances, respectively, while the two other GPPs had similar weak performances (see
Figure 4f).
In addition to presenting the statistical metrics at the station level through boxplots, they are also illustrated based on geographical coordinates. This analysis enables a more profound investigation of the spatial variability of GPPs performance in Iran.
Figure 5 shows the calculated continuous metrics, presenting the precipitation intensity accuracies at each station. It is worth noting that the country’s second-level basins boundaries (in black line) are also overlaid to benefit the individuals working on smaller-scale studies (i.e., one or more basins). In general, all GPPs had the best performances in the western part of the country; however, GSMaP, ERA5, and IMERG presented considerably more applicable performances throughout Iran. In terms of CC (
Figure 5—left column), for TMPA, the best performances were observed for 81 (33%) stations in CC ranges of 0.4–0.6. However, the remnant (67%), mostly located in the northern parts of the country with higher precipitation rates and central regions, were in lower CC value ranges. Relatively similar patterns were also observed for CHIRPS and PERSIANN with slightly better performances, in which 38% and 40% of stations were assigned in the same CC range, and very few stations obtained higher CC values. However, three other GPPs, i.e., GSMaP, ERA5, and IMERG, manifested considerably better performances by including many stations with higher CC values. Particularly, ERA5, with 127 (52%) stations in the CC range of 0.6–0.8, followed by GSMaP and IMERG, with 106 (43%) and 55 (22%) stations in the same range, provided more accurate precipitation. Likewise,
Figure 5 (middle column) illustrates the spatial pattern of the obtained RMSE values at each station for the GPPs. Considering the higher values of RMSE as an adverse indicator of GPPs performance, IMERG included 161 (65%) stations with RMSE values over 4 mm. In contrast, GSMaP provided more precise precipitation estimates since 194 stations (79%) had RMSE values of less than 4 mm. The performance of GSMaP was followed by those of ERA5, TMPA, PERSIANN, and CHIRPS. Further investigations revealed that stations with higher RMSE values were mainly located in regions with higher precipitation rates.
Figure 5 (right column) shows the spatial variability of the average biases (MBE) at each station for the GPPs. On the whole, all GPPs presented diverse performances and included stations with overestimation and underestimation, except for TMPA and IMERG, with almost dominant behaviors in underestimating and overestimating precipitation values, respectively. Underestimation at almost all stations was obtained for TMPA, with dominant MBE values in the range of −1 to 0, indicating low biases throughout the country. Conversely, IMERG exhibited a relatively much higher MBE and included 241 (98%) stations with overestimation, of which the dominant number of stations were fallen in MBE ranges of 0–1 mm (50%) and 1–2 mm (46%). Based on
Figure 5 (right column), CHIRPS, PERSIANN, and ERA5 presented relatively similar patterns, with, respectively, 167 (69%), 181 (74%), and 188 (77%) stations with overestimation and the remnant underestimated precipitation values. For GSMaP, most of the stations were assigned to MBE ranges of -1 mm and 1 mm, of which 158 (65%) and 77 (31%) stations had underestimation and overestimation, resulting in the best performance among other GPPs in total.
Complementary to the spatial evaluation of the selected GPPs using continuous metrics,
Figure 6 presents the obtained CSI values at the station level based on two decades of daily observations. At a glance,
Figure 6 shows that for all GPPs, the weakest performances occurred in the central parts of the country. Furthermore, the findings indicate the superior performance of GSMaP in comparison to other GPPs. In particular, over 34% and 61% of stations fell in the CSI ranges of [0.2–0.4) and (0.4–0.6], respectively, for GSMaP, which proved its potential capability to capture the precipitation occurrence. Meanwhile, ERA5 was identified as the second most accurate GPP, with approximately 50% and 47% of stations in the CSI ranges of [0.2–0.4) and (0.4–0.6], respectively. For IMERG, as the third most accurate GPP in terms of precipitation occurrence detection, the dominant proportion (86%) of stations achieved CSI values in the range of [0.2–0.4), and a minority of stations had upper and lower CSI values. For the other three GPPs, 77% (TMPA), 83% (CHIRPS), and 85% (PERSIANN) of stations acquired CSI values between 0.2 and 0.4, and almost all of the other stations obtained lower CSI values. Based on the results, these GPPs were ordered as PERSIANN, CHIRPS, and TMPA, considering the accurateness in the precipitation occurrence detection at the station level.
5.1.2. Monthly Time Scale
In this subsection, the daily observations (i.e., target and reference precipitations) are compared on monthly time scales. In other words, the continuous and categorical metrics were computed for each month, including dry and wet months, using the daily observations to investigate the performance of the GPPs. In addition to providing the temporal accuracy characteristics of GPPs, the findings might benefit researchers working in individual months. In this regard, the computed metrics are presented using the Taylor Diagram [
86] and Performance Diagram [
87], through which the performance of the GPPs in multi-metric methods can be compared.
The Taylor Diagram [
86] focuses on precipitation intensity assessment metrics, i.e., RMSE, CC, and Standard Deviation (SD), for performance comparison, and the associated results are illustrated in
Figure 7. In these diagrams, the points (i.e., representing target precipitations for GPPs) closer to the “Reference” (i.e., representing reference precipitations from synoptic stations) point, illustrated on the horizontal axis, show better performances. In other words, closer points to the “Reference” had lower RMSE values, higher CC values, and similar SD values to the reference precipitations. In general,
Figure 7 illustrates that in drier months in Iran (i.e., June, July, August, and September), in which lower precipitations occur, all GPPs are relatively closer to the “Reference” point, indicating their better performances. This was because, in drier months, the obtained RMSE and SD values for all GPPs were closer to the ones obtained from reference precipitations. However, the results indicated that all GPPs had their lowest CC values in drier months.
Figure 7 shows that TMPA and GSMaP always had lower SD values than reference precipitations, and the exact reverse for IMERG, while the other GPPs did not have identical characteristics. According to
Figure 7, IMERG had the highest distance from the “Reference” point, mainly due to its higher RMSE and SD values compared to other GPPs. The IMERG CC values were mostly between 0.4 and 0.6 in wet months and lower than 0.4 in drier months, making it the third most accurate GPP. The two other GPPs, namely GSMaP and ERA5, had comparable performances since, in some months, GSMaP performed better and in the remnant, ERA5 had the highest accuracies, and no exact monthly pattern was seen. For these GPPs, the CC values were between 0.4 and 0.7, both had relatively lower RMSE values, and their SD values, especially ERA5, were much closer to the reference precipitations. The other GPPs, i.e., TMPA, CHIRPS, and PERSIANN, had the weakest performance in nearly all months, where low CC, relatively higher RMSE, and disparate SD values were obtained.
The Performance Diagram [
87] focuses on precipitation occurrence assessment metrics, i.e., POD, success ratio (1 − FAR), CSI, and biases, to compare GPPs performances, and related results are presented in
Figure 8. In these diagrams, the points (i.e., representing target precipitation for GPPs) closer to the right top corner had better precipitations occurrence detectability. In other words, closer points to the right top corner had high POD and CSI values and low FAR and bias values.
Figure 8 demonstrates that in nearly all months, PERSIANN obtained higher biases, except for two months, in which ERA5 and IMERG had higher values. Conversely, TMPA and CHIRPS, respectively, had the lowest biases in all months. Additionally, ERA5 achieved the highest POD values monthly time scale, which was most often followed by GSMaP. However, in a few months, higher POD values were recorded for IMERG and PERSIANN than GSMaP. The Performance diagrams indicate that in dry months CHIRPS achieved a higher success ratio (lower FAR) values, while GSMaP was ranked first in the wet months. More general, the performance of all GPPs declined in drier months compared to months with higher precipitation rates and occurrences. Concerning CSI, the best performance was achieved by GSMaP, and ERA5 and IMERG were in the second and third ranks. TMPA and CHIRPS had the lowest CSI values in five and seven months, respectively, suggesting the weakest performance of TMPA. Taking all the metrics of the Performance Diagram into account, GSMaP remained the most accurate GPP in all months, and the ERA5 was second to that, followed by IMERG, PERSIANN, CHIRPS, and TMPA.
5.1.3. Yearly Time Scale
In completing the previous two subsections, daily observations of each year were also used to validate the GPPs based on CC, RMSE, MBE, and CSI (see
Figure 9). This analysis could provide supportive information to understand the temporal characteristics of the selected GPPs, revealing whether they had stable performances in years over the last two decades.
Figure 9a displays the yearly CC values for the GPPs, clearly indicating that before 2011, either GSMaP or ERA5 had the highest CC values, while after that, GSMaP proved to be the most accurate GPP. The results also suggested the better performance of IMERG in comparison to TMPA, CHIRPS, and PERSIANN, being the third most accurate dataset after ERA5 and GSMaP. Moreover, the CC trend lines of GPPs remained stable without a considerable and significant upward/downward trend throughout the time, with the lowest and highest SD of 0.026 (ERA5) and 0.055 (PERSIANN). Likewise,
Figure 9b illustrates the RMSE dynamics for the GPPs, suggesting the higher RMSE values for IMERG. In comparison, other GPPs had lower RMSE values with relatively monotonous trends during the last two decades. The results also demonstrated that the GSMaP in almost all time intervals was the GPP with the least RMSE values, closely followed by ERA5. Additionally, the trends of MBE variations were also investigated (
Figure 9c), and the results show that TMPA and IMERG had extreme consistent behaviors during the study period, having the highest underestimations and overestimations, respectively. The MBE values for the other three GPPs, i.e., CHIRPS, PERSIANN, and ERA5, were stable between 2000 and 2020, varying between 0.0 and 0.5, indicating the stable overestimations of these GPPs. However, the findings demonstrated that GSMaP MBE values had a significant trend (
p-value < 0.05), which was statistically confirmed by checking the randomness test. This certified the improvement in the average bias of the GSMaP that roughly started from nearly -0.5 and reached 0.04 over the last two decades. Finally, the yearly precipitation occurrence detectability of GPPs was examined using the CSI metric. The results indicated stable trends for all GPPs with low variations and the superior performance of GSMaP and ERA5 as the two first accurate GPPs, respectively.