Assessment of 13 Gridded Precipitation Datasets for Hydrological Modeling in a Mountainous Basin

: Precipitation measurement with high spatial and temporal resolution over highly elevated and complex terrain in the eastern part of Turkey is an essential task to manage the water structures in an optimum manner. The objective of this study is to evaluate the consistency and hydrologic utility of 13 Gridded Precipitation Datasets (GPDs) (CPCv1, MSWEPv2.8, ERA5, CHIRPSv2.0, CHIRPv2.0, IMERGHHFv06, IMERGHHEv06, IMERGHHLv06, TMPA-3B42v7, TMPA-3B42RTv7, PERSIANN-CDR, PERSIANN-CCS, and PERSIANN) over a mountainous test basin (Karasu) at a daily time step. The Kling-Gupta Efﬁciency (KGE), including its three components (correlation, bias, and variability ratio), and the Nash-Sutcliffe Efﬁciency (NSE) are used for GPD evaluation. Moreover, the Hanssen-Kuiper (HK) score is considered to evaluate the detectability strength of selected GPDs for different precipitation events. Precipitation frequencies are evaluated considering the Probability Density Function (PDF). Daily precipitation data from 23 meteorological stations are provided as a reference for the period of 2015–2019. The TUW model is used for hydrological simulations regarding observed discharge located at the outlet of the basin. The model is calibrated in two ways, with observed precipitation only and by each GPD individually. Overall, CPCv1 shows the highest performance (median KGE; 0.46) over time and space. MSWEPv2.8 and CHIRPSv2.0 deliver the best performance among multi-source merging datasets, followed by CHIRPv2.0, whereas IMERGHHFv06, PERSIANN-CDR, and TMPA-3B42v7 show poor performance. IMERGHHLv06 is able to present the best performance (median KGE; 0.17) compared to other satellite-based GPDs (PERSIANN-CCS, PERSIANN, IMERGHHEv06, and TMPA-3B42RTv7). ERA5 performs well both in spatial and temporal validation compared to satellite-based GPDs, though it shows low performance in producing a streamﬂow simulation. Overall, all gridded precipitation datasets show better performance in generating streamﬂow when the model is calibrated by each GPD separately.


Introduction
Precipitation data with high spatial and temporal resolution is one of the key components for hydrological modeling [1,2]. Though the inconsistency of precipitation over space and scarcity of ground-based gauge observation, especially over a basin with complex topography, significantly affect the rainfall-runoff simulation process [3,4]. Various methods have been used for precipitation estimates and each of them has its pros and cons. Ground-based gauge networks measure precipitation directly, but the spatial variability of precipitation over time is highly dependent on the density of the gauge network [5,6]. Moreover, technical issues in developing countries and problems such as information exchange and data sharing for transboundary river basins have brought significant challenges to obtaining gauge precipitation data [7]. Ground weather radars are able to present precipitation estimates with high spatial and temporal resolution, however, the complexity of the terrain, limited spatial range, and indirect measurement of precipitation present several errors and uncertainties in radar-based precipitation estimates [8][9][10]. In recent decades, remote sensing and data assimilation technology have developed rapidly. The space-borne sensors and numerical weather prediction models are able to present a variety of Gridded Precipitation Datasets (GPDs) in different spatial and temporal resolutions with nearly global coverage [11,12]. Satellite-based and numerical weather prediction model precipitation estimates can be considered as an alternative to fill the spatio-temporal gaps of ground-based networks, especially over complex topography where gauge network is scarce. However, the presence of high bias in some of these gridded precipitation datasets is one of the factors which limit their application for hydrological modeling [13].
Hydrological models are used to simplify the real-world problems related to the water cycle and water resources management or to consider only relevant information instead of simulating every realistic scenario for a better understanding of basin characteristics [14,15]. Therefore, the number and complexity of hydrological models have increased to support various decisions regarding management policies related to water resources management, climate change, and land use. Simple models need fewer input data, are easy to calibrate the model parameters, and are able to process quickly, while the complexity of the model increases the level of information as input for the model and provides consideration of diverse issues in water resources management [16,17]. River discharge is one of the important hydrologic components and its estimation at a particular location could be necessary for the design and management of different water resource structures [18,19]. Rainfall-runoff models provide the opportunity to simulate streamflow based on observed precipitation or using Gridded Precipitation Datasets (GPDs) as meteorological forcing and can be compared with observed discharge to evaluate their hydrologic utility [20]. In this study, we performed both meteorological and hydrological evaluation of several GPDs and assumed the observed streamflow data are the best available estimates.
Generally, two types of validation methodologies are used to quantify the performance of GPDs: (1) direct comparison of precipitation datasets with a ground-based gauge network or radar-based precipitation estimates (meteorological evaluation); and (2) validation of Gridded Precipitation Datasets (GPDs) using hydrological models to evaluate their strength in streamflow prediction (hydrological evaluation) [21].
Many authors have reported the validation of different gridded precipitation datasets over various regions [1,[22][23][24][25][26][27][28][29][30]. However, the validation and consistency of GPDs for a certain area may not be applicable in others, so a separate assessment is necessary to address their reliability over that particular region.
There are also a number of studies evaluating the performance of certain GPDs over all or selected regions of Turkey [31][32][33][34][35][36][37][38][39]. However, in these investigations, either a limited number or old versioned GPDs have been tested which mainly consider meteorological performance in a coarse monthly time step instead of daily. Thus, we see the necessity of a more comprehensive study to include both meteorological and hydrological performance of GPDs with a finer time step (daily) and to consider seasonal effects.
This study aims to evaluate the spatio-temporal consistency of 13 GPDs by considering the entire period and seasonal variability of precipitation as well as testing the hydrological utility based on two different scenarios (Scheme-1 and Scheme-2) over a complex topography used for different scientific research projects in Turkey. The structure of this paper is as follows: Section 1 presents a comprehensive introduction to GPDs. Section 2 gives information on materials and methods. Section 3 displays results and detailed discussions, and conclusions are presented in Section 4.

Study Area
The Karasu basin (38 •  eastern part of Turkey. The basin has a drainage area of around 10,250 km 2 and elevation ranges from 1130 m to 3500 m. Considering the terrain complexity and mountainous climate regime, most precipitation occurs in the form of snow and is retained on the ground for almost half a year, contributing to streamflow when the temperatures increase. The Karasu basin is one of the major tributaries of the Euphrates River, known as the longest transboundary river in southwest Asia and being the largest river basin (127,300 km 2 ) with a 17% total water potential in Turkey that embraces large man-made reservoirs used for irrigation, water supply, hydropower, and flood control purposes. Hence, evaluating the hydrologic response of different GPDs over an important headwater catchment is crucial for the optimal management of water resources.

Study Area
The Karasu basin (38°58′ E to 41°39′ E and 39°23′ N to 40°25′ N) is defined as the study area controlled by Kemah (E21A019) hydrological station ( Figure 1) located in the eastern part of Turkey. The basin has a drainage area of around 10,250 km 2 and elevation ranges from 1130 m to 3500 m. Considering the terrain complexity and mountainous climate regime, most precipitation occurs in the form of snow and is retained on the ground for almost half a year, contributing to streamflow when the temperatures increase. The Karasu basin is one of the major tributaries of the Euphrates River, known as the longest transboundary river in southwest Asia and being the largest river basin (127,300 km 2 ) with a 17% total water potential in Turkey that embraces large man-made reservoirs used for irrigation, water supply, hydropower, and flood control purposes. Hence, evaluating the hydrologic response of different GPDs over an important headwater catchment is crucial for the optimal management of water resources.

Hydro-Meteorological Data
In this study, daily observed precipitation and temperature data from 23 groundbased stations (independent stations whose data are not shared and evaluated with worldwide research centers for bias correction of gauge corrected GPDs) are used to validate 13 selected GPDs in and around the Karasu basin. Moreover, daily streamflow data collected at the outlet of the basin is utilized to assess the hydrologic performance of GPDs. All the above-mentioned data are evaluated for five recent water years from October 2014 to September 2019.
In this context, the study uses daily based precipitation data obtained from 13 selected GPDs whose detailed information is presented in

Hydro-Meteorological Data
In this study, daily observed precipitation and temperature data from 23 ground-based stations (independent stations whose data are not shared and evaluated with worldwide research centers for bias correction of gauge corrected GPDs) are used to validate 13 selected GPDs in and around the Karasu basin. Moreover, daily streamflow data collected at the outlet of the basin is utilized to assess the hydrologic performance of GPDs. All the above-mentioned data are evaluated for five recent water years from October 2014 to September 2019.
In this context, the study uses daily based precipitation data obtained from 13 selected GPDs whose detailed information is presented in Considering input and methodology, selected GPDs can be categorized into four different groups: (1) taking advantage of spatial information from ground-based gauge precipitation data (CPCv1); (2) utilizing reanalysis data from numerical weather prediction model outputs (ERA5); (3) using satellite Passive Microwave (PMW) and Infrared (IR) sensor data (IMERGHHEv06, IMERGHHLv06, TMPA-3B42RTv7, PERSIANN, and PERSIANN-CCS); and (4) multi-source merging precipitation datasets with gauge uncorrected (CHIRPv2.8) as well as gauge corrected products (MSWEPv2.8 and CHIRPSv2.0, IMERGHHFv06, TMPA-3B42v7, and PERSIANN-CDR). Furthermore, gauge corrected multi-source merging precipitation datasets use gauge-based precipitation measurements from various sources having different spatial and temporal resolutions. For example, TMPA-3B42v7 and IMERGHHFv06 use monthly Global Precipitation Climatology Centre (GPCC) with 1 • spatial resolution [45,46] and PERSIANN-CDR utilizes monthly Global Precipitation Climatology Project (GPCP) datasets with a 2.5 • spatial resolution [47]. Moreover, CHIRPSv2.0 includes pentadal precipitation estimates from the Climate Hazards group Precipitation climatology (CHPclim) datasets and daily precipitation data from other national meteorological agencies and private streams [44]. MSWEPv2.8 includes World-Clim 2 datasets with 1 km spatial resolution and use monthly GPCC, Global Historical Climatology Network-Daily (GHCN-D), Summary of the Day (GSOD), and other gauge observations [42]. GPDs can also be differentiated by their spatial and temporal resolution/coverage. For example, PERSIANN-CCS has the highest spatial resolution (0.04 • ) with hourly precipitation data existing from 2003 to near real-time while covering an area of 60 • N/S. On the other hand, CPCv1 has a coarser spatial resolution (0.50 • ) with global coverage exhibiting daily precipitation from 1997 to the present. Moreover, the time in which the dataset is released for public use is an important factor for hydro-climatological studies such as flood forecasting and early warnings. For example, IMERGHHEv06 is released after 4 h of real-time while its final research product (IMERGHHFv06) is released for public use after 3.5 months.

Methodology
The Kling-Gupta Efficiency [50,51] objective function is used to assess the performance of GPDs. Kling-Gupta Efficiency (KGE) has three components, the Pearson correlation coefficient (r) presents the temporal dynamics of precipitation, bias (β), and the variability ratio (γ), determine the volume distribution of precipitation. The Nash-Sutcliffe Efficiency (NSE) and KGE are utilized to evaluate the strength of 13 GPDs for reproducing streamflow time series. Moreover, the Hanssen-Kuiper (HK) Score is used to show the strength of GPD distinguishing between occurrences and non-occurrences of a certain event. Finally, the probability density function (PDF) is exploited to classify the rainfall intensity occurrences of GPDs and observed gauge precipitation [52,53]. Table 2 shows the properties of selected evaluation metrics whereby the optimal value is unity for each of them. Table 2. Properties of performance indices for evaluation of GPDs.

Indicator Mathematical Statement Explanation
Kling-Gupta Efficiency and its components is the ratio of estimated and observed mean, γ (Variability Ratio) is the ratio of estimated and observed coefficients of variation, µ and δ are the distribution mean and standard deviation where s and o indicate estimated and observed. M (Miss); when the observed precipitation is not detected. F (False); when the precipitation is detected but not observed, H (Hit); when the observed precipitation is correctly detected, CN (Correct Negative); a no precipitation event is detected. n is the sample size of the observed or calculated streamflow. Nash-Sutcliffe Efficiency The daily-based precipitation events from gauge and GPDs are discretized into five thresholds considering World Meteorological Organization [54] standard for rainfall intensity classification later modified by Zambrano-Bigiarini [55]. The five precipitation thresholds considered are no-precipitation (less than 1 mm/day), light precipitation (1-5 mm/day), moderate precipitation (5-20 mm/day), heavy precipitation (20-40 mm/day), and violent precipitation (more than 40 mm/day). This categorization is important for hydrological studies whereby different intensity classes may present a distinct hydrologic response over the basin. A point to grid approach is selected for comparison of GPDs with gauge precipitation data where the value of each grid box at the station location is extracted by linear interpolation [56]. Finally, based on the temporal availability of observed and GPDs, the evaluation period is selected from October 2014 to September 2019.
The conceptual TUW hydrological model successfully tested in several studies [57][58][59][60][61][62] is utilized in this work. The TUW model is developed based on the similar structure of the widely recognized Hydrologiska Byråns Vattenbalansavdelning (HBV) [63,64] model and operates on a daily time step. TUW model inputs are total precipitation (mm), mean air temperature ( • C), and potential evapotranspiration (mm) and include 15 model parameters (Table 3) to calibrate snow, soil moisture, and runoff routines. The hydroPSO R package, which includes the particle swarm global optimization algorithm [65,66], is used to calibrate the TUW model parameters. Generally, two types of hydrologic simulation scenarios are widely used for the hydrologic utility of GPDs depending on the level of information provided: (a) model parameters are fitted according to simulated and observed streamflow time series using observed precipitation data as input and afterward the observed precipitation is replaced by GPDs for validation. This method is more efficient for gauged basins. In the second type, (b) model parameters are calibrated/validated with streamflow data using each GPD as a model input independently. This method is recommended for ungauged basins where only observed streamflow and GPDs are available [4,67]. Hence, in this study, both schemes are considered for the hydrologic response of the basin based on observed and GPDs input. Figure 2 shows the spatial distribution of the mean daily precipitation derived from observed gauges and 13 GPDs, including their bias, at the corresponding ground station location for the selected period (2015-2019). Considering the observed data, the mean daily precipitation increases from the north (1-1.5 mm) to the south (2.5-3 mm) of the basin. CPCv1, the only selected GPD which is developed based on information collected from ground station networks, is able to reproduce mean daily precipitation of around 1-2 mm/day inside the basin. However, this dataset overestimates precipitation in the northeast and underestimates in the southwest part of the basin with a varying mean daily precipitation bias from −0.5 to 0.5 mm. ERA5 presents higher mean daily precipitation (1.5-2.5 mm) compared to the observed precipitation with overestimating bias in and around the area of study. Among the GPDs that provide information by combining different sources, both CHIRPSv2.0 and CHIRPv2.0 reproduce the mean daily precipitation quite well (1-1.5 mm), especially within the basin, and their bias varies from 0 to 0.5 mm, while MSWEPv2.8 estimates higher precipitation (1.5-2 mm) and a slightly larger bias.

Spatial and Temporal Evaluation of Daily Precipitation
Overall, GPDs that combine only ground and satellite data (G, S), seem to map mean daily precipitation poorly compared to other multi-source products. IMERGHHFv06 is able to represent slightly better mean daily precipitation (1.5-2 mm) compared to TMPA3B42v7 (1.5-2.5 mm) while PERSIANN-CDR presents higher daily precipitation (2-3 mm) and shows precipitation bias of 1-2 mm comparatively. Among GPDs which only use satellite data, IMERGHHEv06 and IMERGHHLv06 are able to reproduce mean daily precipitation (1-2 mm) well compared to adjusted (IMERGHHFv06) and other satellite-based GPDs such as TMPA-3B42RTv7 (1.5 to 2.5 mm). In the same way, PERSIANN-CCS is one of the GPDs which presents higher mean daily precipitation (2-3 mm) and a precipitation bias close to PERSIANN-CDR (1-2 mm), while PERSIANN always shows mean daily precipitation less than 1 mm and its bias increasing from the north (−0.5-0 mm) to the south (−1 mm to −2 mm). Figure 3 shows the mean daily precipitation and its estimated bias for the selected GPDs at the regional scale considering the entire period and four seasons (Spring, Summer, Autumn, and Winter). According to observed mean daily precipitation, the region receives more precipitation during the spring (2.24 mm) followed by winter (1.78 mm), while the summer (0.74 mm) and autumn (1.2 mm) seasons show less precipitation with a 1.5 mm mean daily precipitation estimated over the Karasu basin for the entire period. CPCv1 presents mean precipitation that is close to the observed for the entire period and all seasons where the estimated mean daily precipitation bias does not exceed ±0.3 mm. Among multi-source merging GPDs, both CHIRPv2.0 and CHIRPSv2.0 reproduce mean daily precipitation well and only underestimate precipitation during the winter season (bias; −0.22 mm for CHIRPSv2.0 and −0.43 mm for CHIRPv2.0). MSWEPv2.8 gives values close to the observed during winter and overestimates precipitation for the rest of the seasons and the entire period, indicating high overestimation for the spring season (0.5 mm). ERA5 estimates more precipitation compared to observed during the spring season (3.3 mm, bias; 1.1 mm) and is able to reproduce autumn precipitation (1.5 mm, bias; 0.3 mm) well enough. Among GPDs which combine only ground and satellite data Atmosphere 2022, 13, 143 7 of 21 (G, S), IMERGHHFv06 produces good results on mean daily precipitation for the entire period (1.8 mm), spring (2 mm), and autumn (1.5 mm) compared to TMPA-3B42v7 which performs better during the summer (0.8 mm) and winter (2.1 mm) seasons. PERSIANN-CDR always overestimates mean daily precipitation and it shows a higher bias during winter (bias; 1.8 mm), comparatively. Among satellite-based GPDs, both IMERGHHEv06 and IMERGHHLv06 reproduce mean daily precipitation close to observed while TMPA-3B42RTv7 and PERSIANN-CCS overestimate and PERSIANN significantly underestimates considering the entire period and four seasons. Overall, GPDs that combine only ground and satellite data (G, S), seem to map mean daily precipitation poorly compared to other multi-source products. IMERGHHFv06 is able to represent slightly better mean daily precipitation (1.5-2 mm) compared to TMPA3B42v7 (1.5-2.5 mm) while PERSIANN-CDR presents higher daily precipitation (2-3 mm) and shows precipitation bias of 1-2 mm comparatively. Among GPDs which only use satellite data, IMERGHHEv06 and IMERGHHLv06 are able to reproduce mean daily precipitation (1-2 mm) well compared to adjusted (IMERGHHFv06) and other satellitebased GPDs such as TMPA-3B42RTv7 (1.5 to 2.5 mm). In the same way, PERSIANN-CCS is one of the GPDs which presents higher mean daily precipitation (2-3 mm) and a precipitation bias close to PERSIANN-CDR (1-2 mm), while PERSIANN always shows mean daily precipitation less than 1 mm and its bias increasing from the north (−0.5-0 mm) to the south (−1 mm to −2 mm). Figure 3 shows the mean daily precipitation and its estimated bias for the selected GPDs at the regional scale considering the entire period and four seasons (Spring, Summer, Autumn, and Winter). According to observed mean daily precipitation, the region receives more precipitation during the spring (2.24 mm) followed by winter (1.78 mm), while the summer (0.74 mm) and autumn (1.2 mm) seasons show less precipitation with a 1.5 mm mean daily precipitation estimated over the Karasu basin for the entire period. (1.8 mm), spring (2 mm), and autumn (1.5 mm) compared to TMPA-3B42v7 which performs better during the summer (0.8 mm) and winter (2.1 mm) seasons. PERSIANN-CDR always overestimates mean daily precipitation and it shows a higher bias during winter (bias; 1.8 mm), comparatively. Among satellite-based GPDs, both IMERGHHEv06 and IMERGHHLv06 reproduce mean daily precipitation close to observed while TMPA-3B42RTv7 and PERSIANN-CCS overestimate and PERSIANN significantly underestimates considering the entire period and four seasons.         [41,42,44]. However, CHIRPSv2.0 overestimates bias (bias; 1-1.5) and its variability ratio varies from 0.8 to 1.2 while CHIRPv2.0 shows poor performance compared to MSWEPv2.8 and CHIPRSv2.0. ERA5 performs better in the southern areas (KGE; 02.5-0.50) while within the basin and in the northern areas its performance is much dispersed. All GPDs with satellite and gauge combination have a low performance inside the basin and in the northern part (KGE; <0) but perform better in the southern areas (KGE; 0.1-0.25). IMERGHHLv06 and IMERGHHEv06 show a higher performance compared to its gauge corrected (IMERGHHFv06) dataset and other satellite-based GPDs, although these two datasets show a close performance to CHIRPSv2.0 and CHIRPv2.0. PERSIANN, being another satellite-based GPD, performs better than PERSIANN-CCS and PERSIANN-CDR. Unlike other GPDs, PERSIANN shows high performance in the northern area yet performs poorly in the southern part. TMPA-3B42RTv7 performs poorly compared to other satellite-based GPDs but presents a close relation to its gauge corrected (TMPA-3B42v7) dataset.

Consistency of GPDs over Time and Space
Atmosphere 2022, 13, 143 10 of 21 (KGE; <0) but perform better in the southern areas (KGE; 0.1-0.25). IMERGHHLv06 and IMERGHHEv06 show a higher performance compared to its gauge corrected (IMERGHHFv06) dataset and other satellite-based GPDs, although these two datasets show a close performance to CHIRPSv2.0 and CHIRPv2.0. PERSIANN, being another satellite-based GPD, performs better than PERSIANN-CCS and PERSIANN-CDR. Unlike other GPDs, PERSIANN shows high performance in the northern area yet performs poorly in the southern part. TMPA-3B42RTv7 performs poorly compared to other satellite-based GPDs but presents a close relation to its gauge corrected (TMPA-3B42v7) dataset. Figure 5. Spatial distribution of KGE and its correlation, bias, and variability ratio components at a daily time step over the Karasu river basin for 13 GPDs. The title color presents the source (s) of GPDs: satellite-based (blue), gauge and satellite (red), reanalysis and satellite (sky blue), reanalysis (green), reanalysis, ground, and satellite (steel blue), and ground (yellow). Figure 6 shows the median Kling-Gupta Efficiency (KGE) and its three components (correlation, bias, and variability ratio) for the entire period, all four seasons (spring, summer, autumn, and winter), and considers daily precipitation. CPCv1 shows the best performance for the entire period and four seasons, compared to other GPDs, with a higher spring (median KGE; 0.46) and a lower summer (median KGE; 0.16) result. Among multisource merging GPDs, MSWEPv2.8 performs better (median KGE; 0.34) followed by CHIRPSv2.0 (0.15) and CHIRPv2.0 (0.09) for the entire period.
The best performance for ERA5 comes within the autumn season (median KGE; 0.27) while showing lower results (median KGE; −0.09) during winter. This diverse performance of ERA5 over time and space may be attributed to the limited numerical weather prediction model utilized to demonstrate small-scale convective cells, in line with previous studies [68,69]. All GPDs which combine only gauge and satellite data, give poor results. While IMERGHHFv06 performs slightly better during spring and autumn, TMPA-3B42v7 can only present a positive KGE in summer. This indicates that the successor (IMERGHHFv06) shows a slightly higher performance compared to its predecessors (TMPA-3B42v7) which is consistent with previous evaluations over Turkey [32] and may be related to product algorithm improvements [45]. PERSIANN-CDR performs poorly in Figure 5. Spatial distribution of KGE and its correlation, bias, and variability ratio components at a daily time step over the Karasu river basin for 13 GPDs. The title color presents the source (s) of GPDs: satellite-based (blue), gauge and satellite (red), reanalysis and satellite (sky blue), reanalysis (green), reanalysis, ground, and satellite (steel blue), and ground (yellow). Figure 6 shows the median Kling-Gupta Efficiency (KGE) and its three components (correlation, bias, and variability ratio) for the entire period, all four seasons (spring, summer, autumn, and winter), and considers daily precipitation. CPCv1 shows the best performance for the entire period and four seasons, compared to other GPDs, with a higher spring (median KGE; 0.46) and a lower summer (median KGE; 0.16) result. Among multi-source merging GPDs, MSWEPv2.8 performs better (median KGE; 0.34) followed by CHIRPSv2.0 (0.15) and CHIRPv2.0 (0.09) for the entire period.
The best performance for ERA5 comes within the autumn season (median KGE; 0.27) while showing lower results (median KGE; −0.09) during winter. This diverse performance of ERA5 over time and space may be attributed to the limited numerical weather prediction model utilized to demonstrate small-scale convective cells, in line with previous studies [68,69]. All GPDs which combine only gauge and satellite data, give poor results. While IMERGHHFv06 performs slightly better during spring and autumn, TMPA-3B42v7 can only present a positive KGE in summer. This indicates that the successor (IMERGHHFv06) shows a slightly higher performance compared to its predecessors (TMPA-3B42v7) which is consistent with previous evaluations over Turkey [32] and may be related to product algorithm improvements [45]. PERSIANN-CDR performs poorly in all four seasons. This low performance of satellite-gauge combination GPDs could firstly be attributed to the satellite-based algorithm and subsequently to the gauge correction procedure. As mentioned before in Section 2.2, gauge correction is applied from different sources using a different number of gauges, hence it is more important to know how much information from gauge data is delivered by each gauge corrected GPD within the complex topographic region. IMERGHHEv06 and IMERGHHLv06 show a better outcome compared to other satellite-based GPDs for the entire period and four seasons. PERSIANN performs better than its gauge corrected dataset (PERSIANN-CDR) and PERSIANN-CCS overall. Among all GPDs, PERSIANN-CCS shows the overall highest bias and PERSIANN giving the lowest. The highest observed overestimate is given by PERSIANN-CCS in winter (3.34) while the highest underestimate is detected by PERSIANN in summer (0.57). Hence, GPDs exclusively based on satellite (PMW or IR) show low performance over complex topography and snow dominant regions, consistent with earlier studies concerned with GPD validations [70][71][72]. all four seasons. This low performance of satellite-gauge combination GPDs could firstly be attributed to the satellite-based algorithm and subsequently to the gauge correction procedure. As mentioned before in Section 2.2, gauge correction is applied from different sources using a different number of gauges, hence it is more important to know how much information from gauge data is delivered by each gauge corrected GPD within the complex topographic region. IMERGHHEv06 and IMERGHHLv06 show a better outcome compared to other satellite-based GPDs for the entire period and four seasons. PER-SIANN performs better than its gauge corrected dataset (PERSIANN-CDR) and PER-SIANN-CCS overall. Among all GPDs, PERSIANN-CCS shows the overall highest bias and PERSIANN giving the lowest. The highest observed overestimate is given by PER-SIANN-CCS in winter (3.34) while the highest underestimate is detected by PERSIANN in summer (0.57). Hence, GPDs exclusively based on satellite (PMW or IR) show low performance over complex topography and snow dominant regions, consistent with earlier studies concerned with GPD validations [70][71][72]. Figure 6. GPD reliability at regional scale under the Kling-Gupta efficiency (KGE) and its components for daily precipitation over the Karasu river basin for the entire period and four seasons. Yaxis color presents: satellite-based (blue), gauge and satellite (red), reanalysis and satellite (sky blue), reanalysis (green), reanalysis, ground, and satellite (steel blue), and ground (yellow). Figure 7 shows the result of precipitation frequencies of various intensities derived from gauge precipitation and 13 GPDs during the entire period and four seasons. Based on the observed data, 77% of precipitation events occur in the range of 0-1 mm/day for the entire period while this amount decreases for spring and winter precipitation and shows more frequency during summer and autumn. Furthermore, as expected, the frequency of precipitation events decreases as the intensity of precipitation increases. GPDs show varying frequencies of precipitation intensities that are especially noticeable during the spring season. PERSIANN shows more precipitation intensities for 0-1 mm/day compared to the observed and presents less frequency for light precipitation (1-5 mm/day) events during the entire period and all seasons. Among multi-source merging GPDs, CHIRPSv2.0 presents precipitation frequencies close to the observed intensity of 0-1 mm/day while CHIRPv2.0 and MSWEPv2.8 overestimate light precipitation. In the same way, PERSIANN-CDR significantly overestimates light precipitation (1-5 mm/day) and underestimates 0-1 mm/day intensities compared to IMERGHHFv06 and TMPA-3B42v7 for the entire period and four seasons. PERSIANN is the only satellite-based GPD that overestimates precipitation 0-1 mm/day and underestimates other precipitation events (light, moderate, heavy, and violent precipitation). IMERGHHEv06 and IMERGHHLv06 show close frequencies to the observed for different intensities during the entire period and four seasons. TMPA-3B42RTv7 shows similar results as its gauge corrected (TMPA-3B42v7) dataset. Figure 6. GPD reliability at regional scale under the Kling-Gupta efficiency (KGE) and its components for daily precipitation over the Karasu river basin for the entire period and four seasons. Y-axis color presents: satellite-based (blue), gauge and satellite (red), reanalysis and satellite (sky blue), reanalysis (green), reanalysis, ground, and satellite (steel blue), and ground (yellow). Figure 7 shows the result of precipitation frequencies of various intensities derived from gauge precipitation and 13 GPDs during the entire period and four seasons. Based on the observed data, 77% of precipitation events occur in the range of 0-1 mm/day for the entire period while this amount decreases for spring and winter precipitation and shows more frequency during summer and autumn. Furthermore, as expected, the frequency of precipitation events decreases as the intensity of precipitation increases. GPDs show varying frequencies of precipitation intensities that are especially noticeable during the spring season. PERSIANN shows more precipitation intensities for 0-1 mm/day compared to the observed and presents less frequency for light precipitation (1-5 mm/day) events during the entire period and all seasons. Among multi-source merging GPDs, CHIRPSv2.0 presents precipitation frequencies close to the observed intensity of 0-1 mm/day while CHIRPv2.0 and MSWEPv2.8 overestimate light precipitation. In the same way, PERSIANN-CDR significantly overestimates light precipitation (1-5 mm/day) and underestimates 0-1 mm/day intensities compared to IMERGHHFv06 and TMPA-3B42v7 for the entire period and four seasons. PERSIANN is the only satellite-based GPD that overestimates precipitation 0-1 mm/day and underestimates other precipitation events (light, moderate, heavy, and violent precipitation). IMERGHHEv06 and IMERGHHLv06 show close frequencies to the observed for different intensities during the entire period and four seasons. TMPA-3B42RTv7 shows similar results as its gauge corrected (TMPA-3B42v7) dataset.  Figure 8 shows the detectability strength of 13 GPDs for five different daily precipitation groups considering the entire period and four seasons expressed in the form of the Hanssen-Kuiper (HK) score. Overall, GPDs show higher detectability during autumn and lower during summer; additionally, the detection strength of GPDs decreases as the precipitation intensity increases which is generally the case in literature. This can be attributed to the classification of several intensity classes which makes it hard to differentiate among them instead of a simple rain/no rain scenario. This division gets even more problematic when the occurrence probability of a certain event is seen much more rarely compared to other classes (heavy and violent intensities) as can be followed from Figure  7. Hence, detecting a rarely occurring very intense precipitation event has a weaker performance when assessed against a more frequently occurring light storm. From the results, CPCv1 shows the highest detectability for intensity less than 1 mm/day and light precipitation (1-5 mm/day), followed by MSWEPv2.8 and ERA5. CHIRPv2.0 shows better detectability for precipitation less than 1 mm/day, light and moderate precipitation compared to CHIRPSv2.0 while CHIRPSv2.0 presents slightly better results for heavy and violent precipitation. IMERGHH datasets show similar detectability values to each other considering different precipitation groups while TMPA datasets portray lower detectability values compared to IMERGHH products. PERSIANN-CDR performs better when evaluated against PERSIANN-CCS and PERSIANN.  Figure 8 shows the detectability strength of 13 GPDs for five different daily precipitation groups considering the entire period and four seasons expressed in the form of the Hanssen-Kuiper (HK) score. Overall, GPDs show higher detectability during autumn and lower during summer; additionally, the detection strength of GPDs decreases as the precipitation intensity increases which is generally the case in literature. This can be attributed to the classification of several intensity classes which makes it hard to differentiate among them instead of a simple rain/no rain scenario. This division gets even more problematic when the occurrence probability of a certain event is seen much more rarely compared to other classes (heavy and violent intensities) as can be followed from Figure 7. Hence, detecting a rarely occurring very intense precipitation event has a weaker performance when assessed against a more frequently occurring light storm.  Figure 8 shows the detectability strength of 13 GPDs for five different daily precipitation groups considering the entire period and four seasons expressed in the form of the Hanssen-Kuiper (HK) score. Overall, GPDs show higher detectability during autumn and lower during summer; additionally, the detection strength of GPDs decreases as the precipitation intensity increases which is generally the case in literature. This can be attributed to the classification of several intensity classes which makes it hard to differentiate among them instead of a simple rain/no rain scenario. This division gets even more problematic when the occurrence probability of a certain event is seen much more rarely compared to other classes (heavy and violent intensities) as can be followed from Figure  7. Hence, detecting a rarely occurring very intense precipitation event has a weaker performance when assessed against a more frequently occurring light storm. From the results, CPCv1 shows the highest detectability for intensity less than 1 mm/day and light precipitation (1-5 mm/day), followed by MSWEPv2.8 and ERA5. CHIRPv2.0 shows better detectability for precipitation less than 1 mm/day, light and moderate precipitation compared to CHIRPSv2.0 while CHIRPSv2.0 presents slightly better results for heavy and violent precipitation. IMERGHH datasets show similar detectability values to each other considering different precipitation groups while TMPA datasets portray lower detectability values compared to IMERGHH products. PERSIANN-CDR performs better when evaluated against PERSIANN-CCS and PERSIANN. From the results, CPCv1 shows the highest detectability for intensity less than 1 mm/day and light precipitation (1-5 mm/day), followed by MSWEPv2.8 and ERA5. CHIRPv2.0 shows better detectability for precipitation less than 1 mm/day, light and moderate precipitation compared to CHIRPSv2.0 while CHIRPSv2.0 presents slightly better results for heavy and violent precipitation. IMERGHH datasets show similar detectability values to each other considering different precipitation groups while TMPA datasets portray lower detectability values compared to IMERGHH products. PERSIANN-CDR performs better when evaluated against PERSIANN-CCS and PERSIANN.

Hydrologic Evaluation of GPDs
The TUW model is utilized to simulate streamflow at the Karasu basin outlet from 2015 to 2019 using observed precipitation and 13 GPDs in two steps. Firstly, model parameters are calibrated by observed data and then replaced by each GPD (Scheme-1). Afterward, model parameters are calibrated based on observed data and each GPD individually (Scheme-2). Figure 9 displays the hydrographs at the basin outlet for different precipitation input products considering the two schemes. In all cases, the model is calibrated for two waters years (October 2014-September 2016) and validated for three water years (October 2016-September 2019).
Atmosphere 2022, 13,143 13 of 21 The TUW model is utilized to simulate streamflow at the Karasu basin outlet from 2015 to 2019 using observed precipitation and 13 GPDs in two steps. Firstly, model parameters are calibrated by observed data and then replaced by each GPD (Scheme-1). Afterward, model parameters are calibrated based on observed data and each GPD individually (Scheme-2). Figure 9 displays the hydrographs at the basin outlet for different precipitation input products considering the two schemes. In all cases, the model is calibrated for two waters years (October 2014-September 2016) and validated for three water years (October 2016-September 2019).  Figure 10 depicts the scatter plot of simulated daily streamflow against the observed discharge obtained from in situ precipitation and selected GPDs at the basin outlet considering the two schemes. Generally, GPDs show a higher streamflow amount in Scheme-1 compared to Scheme-2. However, CHIRPSv2.0 displays quite comparable discharge values when the model parameters are calibrated either by observed precipitation (Scheme-1) or GPD itself (Scheme-2). For high flows, PERSIANN underestimates streamflow more in Scheme-1 than Scheme-2. ERA5 shows distinct streamflow differences for the two schemes and its reproducibility improves when the model parameters are calibrated by ERA5 (Scheme-2). MSWEPv2.8 exhibits close streamflow amounts for both schemes, but shows slightly more discharge in Scheme-1. Both IMERGHHEv06 and IMERGHHLv06 demonstrate less variation of streamflow compared to IMERGHHFv06. The rest of the GPDs indicate better streamflow reproducibility for Scheme-2 than Scheme-1.   The nonlinearity of simulated streamflow can be related to the high bias which is noted in the direct comparison of GPDs with observed precipitation (Section 3.2 and Figures 5 and 6). Moreover, in Scheme-1, model parameters are calibrated by observed precipitation only, which is not an optimal parameter set for GPDs. This may be the reason for high degradation (overestimates) in GPD streamflow prediction during the calibration/validation period (especially for ERA5, IMERGHH datasets, TMPA datasets, PERSIANN-CDR, and PERSIANN-CCS). Furthermore, it should be kept in mind that for Scheme-2 modeling, precipitation values are taken from GPDs while PET and temperature values are still a part of observed meteorological forcing in the model. Figure 11 shows the Kling-Gupta Efficiency (KGE) with its three components (correlation, bias, and variability ratio) and Nash-Sutcliffe Efficiency (NSE) streamflow modeling results for the two schemes considering calibration, validation, and entire periods. When the model is forced to simulate streamflow with observed gauge data, it reproduces quite high KGE (0.92) and NSE (0.84) with almost no bias and variability ratio during the calibration period. The model is able to keep the good performance for the validation period with KGE (0.83) and NSE (0.75) scores. In Scheme-1, CPCv1 performs close to the observed results for daily streamflow simulation with a slight overestimation in bias and underestimation in variability ratio for all phases. Among multi-sources merging GPDs, both CHIRPv2.0 and CHIRPSv2.0 display a good performance for the calibration and validation periods as compared to MSWEPv2.8 which, interestingly, presents better results for its validation stage than calibration. Moreover, IMERGHHFv06, TMPA-3B42v7, PERSIANN-CDR, and ERA5 perform poorly in simulating the streamflow for Scheme-1. Among satellitebased GPDs, both IMERGHHEv06 and IMERGHHLv06 show positive performance for the calibration period with their performance decreasing (with negative NSE) for validation and the entire period. TMPA-3b42RTv7 and PERSIANN-CCS perform poorly in simulating the streamflow for all periods and show negative KGE and NSE. The PERSIANN dataset shows varying performance compared to other GPDs and underestimates bias and overestimates the variability ratio. When the model parameters are calibrated based on each GPD individually (Scheme-2), all GPDs simulate streamflow with high performance for the calibration period. MSWEPv2.8 shows close reproducibility of streamflow to CPCv1 and observed discharge for calibration, validation, and the entire period. Comparatively, PERSIANN-CCS shows high KGE (0.82) and NSE (0.65) during the calibration period but its reproducibility for streamflow generation is poor for validation and the entire period. Among all GPDs, CHIRPv2.0 and PERSIANN behave differently in Scheme-2. Both simulate streamflow well for the calibration phase while their performance decreases during validation and the entire period as compared to Scheme-1. PERSIANN-CCS significantly overestimates bias in both schemes. Overall, the TUW model shows good performance over snow dominant catchments as noted in the literature, and the relatively shorter time span for model calibration is one of the reasons for higher KGE. Figure 12 presents the TUW model calibrated parameter values based on the observed data and 13 GPDs. PERSIANN-CCS shows quite a different DDF compared to other datasets. In the same way, ERA5 shows a high Beta, and PERSIANN displays a high K2 value. Additionally, TMPA and PERSIANN datasets show high FC compared to the rest of the GPDs. Further uncertainties arising from meteorological forcing and hydrological models may surely have an effect on streamflow simulations, but a detailed sensitivity/uncertainty analysis is not considered within the scope of this study. Figure 11. Performance of daily streamflow simulations at the Karasu basin outlet using the gauge and 13 GPDs data considering calibration, validation, and entire period. Y-axis color presents: satellite-based (blue), gauge and satellite (red), reanalysis and satellite (sky blue), reanalysis (green), reanalysis, ground, and satellite (steel blue), and ground (yellow). Figure 12 presents the TUW model calibrated parameter values based on the observed data and 13 GPDs. PERSIANN-CCS shows quite a different DDF compared to other datasets. In the same way, ERA5 shows a high Beta, and PERSIANN displays a high K2 value. Additionally, TMPA and PERSIANN datasets show high FC compared to the rest of the GPDs. Further uncertainties arising from meteorological forcing and hydrological models may surely have an effect on streamflow simulations, but a detailed sensitivity/uncertainty analysis is not considered within the scope of this study. Figure 11. Performance of daily streamflow simulations at the Karasu basin outlet using the gauge and 13 GPDs data considering calibration, validation, and entire period. Y-axis color presents: satellite-based (blue), gauge and satellite (red), reanalysis and satellite (sky blue), reanalysis (green), reanalysis, ground, and satellite (steel blue), and ground (yellow).

Conclusions
In this study, the spatio-temporal consistency and hydrologic utility of 13 GPDs are evaluated over the mountainous Karasu basin from the 2015 to 2019 water years considering the observed daily precipitation from 23 meteorological stations in a daily time scale. The Kling-Gupta Efficiency (KGE) and Nash-Sutcliffe Efficiency (NSE) are considered to evaluate the spatial, temporal, and hydrologic response of the basin for different GPDs. Moreover, the Hanssen-Kuiper (HK) score is used to quantify the detectability strength of

Conclusions
In this study, the spatio-temporal consistency and hydrologic utility of 13 GPDs are evaluated over the mountainous Karasu basin from the 2015 to 2019 water years considering the observed daily precipitation from 23 meteorological stations in a daily time scale. The Kling-Gupta Efficiency (KGE) and Nash-Sutcliffe Efficiency (NSE) are considered to evaluate the spatial, temporal, and hydrologic response of the basin for different GPDs. Moreover, the Hanssen-Kuiper (HK) score is used to quantify the detectability strength of GPDs while the precipitation frequency for different precipitation intensities is assessed by the Probability Density Function (PDF). Finally, the rainfall-runoff modeling is conducted using the TUW model within two schemes. The major conclusions are itemized as follows: • CPCv1 gathers information from ground station networks and displays a high performance for the rainfall distribution over time and space. This dataset also presents better detectability in terms of precipitation intensity and demonstrates valuable results when used in streamflow simulations. • Among multi-source merging datasets, MSWEPv2.8 shows close performance to CPCv1 followed by CHIRPSv2.0 and CHIRPv2.0 for direct gauge comparison. CHIRPSv2.0 and CHIRPv2.0 outperform MSWEPv2.8 in accurately simulating streamflow especially in Scheme-1, but not in Scheme-2. GPDs which only use gauge and satellite data such as IMERGHHFv06, TMPA-3B42v7, and PERSIANN-CDR perform poorly in capturing precipitation intensities and show low reproducibility for streamflow generation in Scheme-1.

•
Within satellite-based GPDs, IMERGHHEv06 and IMERGHHLv06 are able to perform better compared to other satellite-based products. While TMPA-3B42RTv7 and PERSIANN-CCS show low performance at all stages, PERSIANN generally underestimates precipitation. ERA5 shows slightly good performance both in spatial and temporal validation when compared to satellite-based GPDs and displays similar results for streamflow prediction in Scheme-2. • Some satellite-based GPDs are becoming available with high spatial resolution and short time lag (latency) which is very important for real-time operation, but the existing bias limits their reliability for hydro-meteorological studies. As an example, PERSIANN-CCS is available after a one-hour time lag with 0.04 • spatial resolution while its performance is not very high. On the other hand, IMERGHHLv06 presents precipitation after 14 h with a coarser spatial resolution (0.1 • ) compared to PERSIANN-CCS and is more reliable among selected satellite-based GPDs. Furthermore, when satellite-based GPDs are merged with other sources such as reanalysis and/or ground observation data, they become more accurate. For example, MSWEPv2.8 and CHIRPSv2.0 are the most reliable GPDs over the Karasu basin, but they have longer time lags varying from one month to a few months. It can be concluded that there are GPDs available for a near-real-time study and as product merging from different sources is implemented, increasing latency, the reliability of the new product seems to increase.

•
Overall, most of the selected 13 GPDs have a low performance over time and space in detecting daily precipitation, but some of them can simulate streamflow quite accurately (Scheme-1). Furthermore, it is detected that GPDs demonstrate better reproducibility of streamflow when the model parameters are calibrated individually for each dataset (Scheme-2).
This study confirms the outperformance of CPCv1, MSWEPv2.8, and CHIRP(S)v2.0 products over other selected GPDs in the Karasu river basin located in the mountainous eastern part of Turkey. The results also indicate that GPDs making use of ground information in their retrospective algorithms may not always be able to attain reliable precipitation estimates. The information source, correction time window, and the number of gauges utilized are the important factors that considerably affect the final product's reliability. Some near-real-time products (such as IMERGHHL(E)v06) show promising performance for short time lag availability. However, they still seem to be far off from being used in reliable streamflow simulations for early warning systems. Therefore, a pre/post-biased adjustment of satellite-based GPDs is recommended with possible high spatial and temporal resolution ground information. In addition, Scheme-2 hydrological modeling is recommended for ungauged basins where the calibration procedure uses GPD, ground temperature, and PET values. Since all model forcings will have their influence on optimal model parameters, it would be more reliable if model calibration may be implemented from the same climate data. Finally, it is worth mentioning that this study is based on a recent five-year time window utilizing 23 ground stations. The consistency of GPDs may be tested for longer time periods with more stations in highly elevated areas for more concrete conclusions. Nonetheless, these findings add a valuable contribution to the existing literature for regions with complex topography such as Turkey and other similar regions of the world.