Evaluation of Gridded Precipitation Datasets over Arid Regions of Pakistan

: The rough topography, harsh climate, and sparse monitoring stations have limited hydro-climatological studies in arid regions of Pakistan. Gauge-based gridded precipitation datasets provide an opportunity to assess the climate where stations are sparsely located. Though, the reliability of these datasets heavily depends on their ability to replicate the observed temporal variability and distribution patterns. Conventional correlation or error analyses are often not enough to justify the variability and distribution of precipitation. In the present study, mean bias error, mean absolute error, modiﬁed index of agreement, and Anderson–Darling test have been used to evaluate the performance of four widely used gauge-based gridded precipitation data products, namely, Global Precipitation Climatology Centre (GPCC), Climatic Research Unit (CRU); Asian Precipitation Highly Resolved Observational Data Integration towards Evaluation (APHRODITE), Center for Climatic Research—University of Delaware (UDel) at stations located in semi-arid, arid, and hyper-arid regions in the Balochistan province of Pakistan. The result revealed that the performance of different products varies with climate. However, GPCC precipitation data was found to perform much better in all climatic regions in terms of most of the statistical assessments conducted. As the temporal variability and distribution of precipitation are very important in many hydrological and climatic applications, it can be expected that the methods used in this study can be useful for the better assessment of gauge-based data for various applications.


Introduction
Precipitation is the key element of the global water cycle and influences the socio-economic development of any region [1,2]. In recent decades, variations and changes in climate have drawn the attention of scientific society [3,4]. Precipitation data are important for studying the changes in the regional and global climate [5,6]. The major problem often encountered in conducting such studies is the unavailability of climatic data over a longer period. Notwithstanding available data, irregular distribution and scarceness of stations, mainly over the uninhabited regions, makes the data unsuitable for the hydrological applications [7]. Furthermore, it is observed that available data quality is mostly not suitable for hydro-climatological assessments. In the present study, precipitation datasets of (1) Global Precipitation Climatology Center, version V.7 (GPCC) [43]; (2) Climatic Research Unit, University of East Angelia, version CRU TS Version 3.21 (CRU) [21]; (3) Asian Precipitation-Highly-Resolved Observational Data Integration Towards Evaluation (APHRODITE) project version (V1101R2/APHRO_MA/050deg) [13]; and (4) the Center for Climatic Research, University of Delaware V 3.02 (UDel) [23] were selected to assess their performance in Balochistan province of Pakistan. The capability of precipitation datasets products was found to vary with geographic and climatic regions [44][45][46][47]. Therefore, the performance of those products was assessed in different climatic zones of Balochistan province of Pakistan namely, semi-arid, arid, and hyper-arid.

Study Area and Datasets
Balochistan province of Pakistan ( Figure 1) is regarded as one of the most vulnerable regions to climate change [48,49]. The province often experiences droughts and water stress conditions due to its arid climate and limited water resources [50]. It is expected that increase in temperature due to global warming will derive more droughts and water stress conditions which will affect the agro-economy of the province [51]. This emphasizes the need for hydro-climatological studies in the region in order to avoid any major crises in water and food scarcity. Hydro-climatological studies often required long-term data, i.e., 30 to 50 years [52]. One of the major problems in working with Balochistan climatic data is its missing values, inhomogeneity, fewer number of stations, and data lengths [53].
Water 2019, 11, x FOR PEER REVIEW 3 of 22 Version 3.21 (CRU) [21]; (3) Asian Precipitation-Highly-Resolved Observational Data Integration Towards Evaluation (APHRODITE) project version (V1101R2/APHRO_MA/050deg) [13]; and (4) the Center for Climatic Research, University of Delaware V 3.02 (UDel) [23] were selected to assess their performance in Balochistan province of Pakistan. The capability of precipitation datasets products was found to vary with geographic and climatic regions [44][45][46][47]. Therefore, the performance of those products was assessed in different climatic zones of Balochistan province of Pakistan namely, semiarid, arid, and hyper-arid.

Study Area and Datasets
Balochistan province of Pakistan ( Figure 1) is regarded as one of the most vulnerable regions to climate change [48,49]. The province often experiences droughts and water stress conditions due to its arid climate and limited water resources [50]. It is expected that increase in temperature due to global warming will derive more droughts and water stress conditions which will affect the agroeconomy of the province [51]. This emphasizes the need for hydro-climatological studies in the region in order to avoid any major crises in water and food scarcity. Hydro-climatological studies often required long-term data, i.e., 30 to 50 years [52]. One of the major problems in working with Balochistan climatic data is its missing values, inhomogeneity, fewer number of stations, and data lengths [53]. The observed precipitation data recorded at fourteen stations located at semi-arid, arid, and hyper-arid climatic zones of Balochistan were collected from the Pakistan Meteorological Department. The locations of rain gauges in the map of Pakistan are shown in Figure 1. The Figure showed that most of the stations are located in arid region, which covers a large area of the province. The mean and the standard deviation of annual precipitation at different stations are presented in Table 1. The table shows that Barkhan station (located in the northeast of the province) receives the highest precipitation of 389.32 mm and Nokkundi station (located in the southeast) receives the lowest precipitation. The highest and the lowest standard deviation in annual precipitation are also recorded at Barkhan and Nokkundi station, respectively. The percentage of missing data at different stations for the period 1961-2007 is also shown in Table 1. Complete set of precipitation data was found at Dalbandin, Khuzdar, Lasbela, Panjgur, Quetta, and Zhob. The highest amount of missing data (8.74%) is observed at Kalat followed by Pasni (7.73%). The performance of gridded data at each station was assessed using available data at the station. The observed precipitation data recorded at fourteen stations located at semi-arid, arid, and hyper-arid climatic zones of Balochistan were collected from the Pakistan Meteorological Department. The locations of rain gauges in the map of Pakistan are shown in Figure 1. The Figure showed that most of the stations are located in arid region, which covers a large area of the province. The mean and the standard deviation of annual precipitation at different stations are presented in Table 1. The table shows that Barkhan station (located in the northeast of the province) receives the highest precipitation of 389.32 mm and Nokkundi station (located in the southeast) receives the lowest precipitation. The highest and the lowest standard deviation in annual precipitation are also recorded at Barkhan and Nokkundi station, respectively. The percentage of missing data at different stations for the period 1961-2007 is also shown in Table 1. Complete set of precipitation data was found at Dalbandin, Khuzdar, Lasbela, Panjgur, Quetta, and Zhob. The highest amount of missing data (8.74%) is observed at Kalat followed by Pasni (7.73%). The performance of gridded data at each station was assessed using available data at the station. The monthly areal average distributions of precipitation at semi-arid, arid, and hyper-arid stations are shown in Figure 2. The figure clearly shows that there are seasonal differences in different climatic zones of the province.  The monthly areal average distributions of precipitation at semi-arid, arid, and hyper-arid stations are shown in Figure 2. The figure clearly shows that there are seasonal differences in different climatic zones of the province. The monthly gridded data of GPCC, CRU, APHRODITE, and UDel with 0.5 o spatial resolution from 1961 to 2007 were extracted from the websites of corresponding organizations. The daily precipitation data of APHRODITE was converted to monthly series for comparison.

Methodology
The study was carried out in two steps. Initially, the observed data for the time period 1961-2007 were arranged and each calendar month was separated from time series for homogeneity test. The data homogeneity was assessed by using standard normal homogeneity test (SNHT) [54], Buishand range test [55], Pettitt test [56], and Von Neumann ratio test [57]. A sequential Student's t-  The monthly gridded data of GPCC, CRU, APHRODITE, and UDel with 0.5 o spatial resolution from 1961 to 2007 were extracted from the websites of corresponding organizations. The daily precipitation data of APHRODITE was converted to monthly series for comparison.

Methodology
The study was carried out in two steps. Initially, the observed data for the time period 1961-2007 were arranged and each calendar month was separated from time series for homogeneity test. The data homogeneity was assessed by using standard normal homogeneity test (SNHT) [54], Buishand range test [55], Pettitt test [56], and Von Neumann ratio test [57]. A sequential Student's t-test was also used to assess the homogeneity by confirming whether data is derived from the same population or not. In the second step, the capability of gridded precipitation datasets was evaluated by visual inspection and using statistics. The visual inspection was done by the comparison of observed and gauge-based gridded precipitation for the whole period as well as for each month along with residuals and scatter plots. The details of the statistics used in the present study to validate gauge-based gridded data are given below.

Mean Bias Errors (MBE)
The MBE is a statistical index used to assess the mean difference between two data products and therefore, widely used to measure the over and under-estimation in data products. MBE can be calculated as where, x gridded and x obs refers to i th gridded and observed data, and n is used to represent the number of data. The positive and negative value of MBE refers to the over and under-estimation of model respectively. Smaller absolute MBE value indicates a better performance of the model.

Mean Absolute Error (MAE)
The MAE measures the average magnitude of the differences between the two sets of data. It measures accuracy for continuous variables without considering the direction of the error. MAE, endorsed by American Society of Civil Engineering is widely used to assess error in simulated time series. MAE can be calculated as where, x gridded and x obs refers i th gridded and observed data, and n is used to represent the number of data. It is a non-negative metric that has no upper bound and zero value indicates a perfect model.

Modified Index of Agreement
The modified index of agreement (md) is used as a standardized measure of the degree of model prediction error. It is the modified version of the index of agreement (d) proposed by Wilmott (1981). Compared to its original form, the modified version is less sensitive to extreme values [58]. The modified index of agreement represents the ratio between the mean square error and the "potential error" [59], and therefore, can detect additive and proportional differences in the data [37]. It is calculated as where x obs and x gridded are observed and gridded precipitation values, while j is exponent to be used in the computation of md. The value of md varies from 0 to 1, where 1 shows perfect agreement while 0 shows no agreement at all [34].

Distribution of Precipitation Data
Anderson-Darling test was used to measure the difference in the probability distributions of observed and gauge-based gridded precipitation. The Anderson-Darling is a powerful test for comparing the similarity in distribution of two samples. The selection and rejection of null hypothesis of equal distribution are arbitrary. In the present study, a significance level (p-value) of 0.05 was used, which means that the hypothesis of equal distribution was rejected if the test yields value less than 0.05.

Homogeneity Test
The homogeneity of the monthly precipitation data assessed using SNHT, Buishand range test, Pettitt test and von Neumann ratio test for a significance level of 0.05 showed that the observed precipitation data at all stations were homogenous. The result of sequential Student's t-test showed that the differences between the sub-set series were not significant at a significance level of 0.05 for any station. Therefore, it was considered that the observed precipitation data are homogeneous.
There are two general ways to compare gridded data with station observation: (i) areal average precipitation for each grid box is computed from available station data and then grid-to-grid comparison is conducted [60][61][62]; (ii) gridded data is interpolated to station location and then compared with observed data. The grid-to-grid comparison is often preferred over point-to-point comparison [63]. Arithmetic mean, isohyetal analysis, Thiessen polygon, and distance weighting or gridding are generally used for estimation of areal precipitation from point data. Distance weighting methods are used for the gridding of observed data with same spatial resolution of gridded data to be compared when the gauges are sufficiently dense and well distributed [64,65]. On the other hand, a simple averaging of station data within the grid box is preferred to compute the areal average rainfall when gauges are sparse and unevenly distributed [66,67]. In case of only one observed station within a grid box, pairwise statistical analyses between the grid point rainfall estimates and rain gauge estimates is carried out assuming that station rainfall is the average observed rainfall for the grid box. This method has been adopted for evaluation of gridded data with in situ measurements by [67][68][69][70][71]. In the present study, no grid box was found to have more than one observed station. Therefore, the performance of different gauge-based gridded precipitation datasets was compared with single station data located within the grid box. However, the results obtained from one station located in each climate zones, hyper-arid, arid, and semi-arid region are presented in details. The locations of these three representative stations are shown in Figure 1. Details of the obtained results are discussed in the following sections.

Performance of Gauge-Based Gridded Precipitation Data
First, the monthly time series of observed and gauge-based gridded precipitation datasets were compared to show the ability of different precipitation datasets in representing the observed precipitation. The residuals or the differences between observed and precipitation datasets at different stations located in the hyper-arid, arid, and semi-arid region are shown in Figures 3-5, respectively as examples.
The time series of observed and gridded precipitation and the residuals at the semi-arid station ( Figure 3) shows that all precipitation datasets followed a similar pattern like the observed data. Nonetheless, gridded precipitation was found under-and over-predicted in some months. The visual comparison of time series reveals that GPCC precipitation matches well with observed precipitation compared to other precipitation products. GPCC was also found to capture most of the peaks in time series compared to other products. For example, GPCC replicated the highest precipitation of 294 mm recorded in August 1984. Other datasets estimated this extreme precipitation event less than 100 mm. In the estimation of dry sequence, all the models except GPCC were found to fail. For example, only GPCC is found to reasonably capture the dry spell in 1967 when no precipitation occurred for the whole year.  Analysis of residual time series revealed that GPCC precipitation has the lowest errors over the period of assessment. The error in GPCC was within the range of ±10 mm, except in the year 1990 when the error was 29 mm. On the other hand, the residuals of other data products were found to vary between 250 mm and −150 mm. The errors were found positive in most of the months which indicate that those precipitation data products underestimated the precipitation. This is found especially true for APHRODITE. Errors in APHRODITE were found positive in most of the months except in a few months after 2000. The residual plot of UDel showed that errors abruptly increased after 1981. Overall, the results show that GPCC precipitation has captured the observed precipitation more accurately compared to others. Figure 4 illustrates the comparison of observed and gauge-based gridded precipitation datasets as well as their differences in the arid region. The precipitation at this station varies from 0 to 120 mm in different months. The comparison of gauge-based gridded precipitation datasets showed that GPCC and UDel are much closer to observed precipitation in the region. On the other hand, the CRU and APHRODITE were found to under-predict the precipitation. It can be seen from Figure 4 that GPCC precipitation perfectly matched with observed precipitation in most of the months, except few random under and overestimations. UDel was also found to replicate the observed precipitation more closely in most of the months. The residual plots of each dataset provide an improved understanding of the nature and sign of errors. The errors in gridded precipitation were found to vary between −60 and 100 mm for different datasets. However, no consistency was found in errors among different datasets. For example, GPCC produced the highest error of about 30 mm in 1990; Analysis of residual time series revealed that GPCC precipitation has the lowest errors over the period of assessment. The error in GPCC was within the range of ±10 mm, except in the year 1990 when the error was 29 mm. On the other hand, the residuals of other data products were found to vary between 250 mm and −150 mm. The errors were found positive in most of the months which indicate that those precipitation data products underestimated the precipitation. This is found especially true for APHRODITE. Errors in APHRODITE were found positive in most of the months except in a few months after 2000. The residual plot of UDel showed that errors abruptly increased after 1981. Overall, the results show that GPCC precipitation has captured the observed precipitation more accurately compared to others. Figure 4 illustrates the comparison of observed and gauge-based gridded precipitation datasets as well as their differences in the arid region. The precipitation at this station varies from 0 to 120 mm in different months. The comparison of gauge-based gridded precipitation datasets showed that GPCC and UDel are much closer to observed precipitation in the region. On the other hand, the CRU and APHRODITE were found to under-predict the precipitation. It can be seen from Figure 4 that GPCC precipitation perfectly matched with observed precipitation in most of the months, except few random under and overestimations. UDel was also found to replicate the observed precipitation more closely in most of the months. The residual plots of each dataset provide an improved understanding of the nature and sign of errors. The errors in gridded precipitation were found to vary between −60 and 100 mm for different datasets. However, no consistency was found in errors among different datasets.  The time series of observed and gridded precipitation as well as their differences at the hyperarid station is shown in Figure 5. The precipitation at this station varies from 0 to 100 mm. About 68% of the data at this station is zero. The results show that the highest precipitation of 94 mm is recorded in 2005 at this station was well replicated by GPCC and UDel, while APHRODITE and CRU were found to underestimate this precipitation event. The GPCC precipitation was found to replicate the observed precipitation time series at this station more accurately compared to other products. The UDel was found to replicate the precipitation more accurately in the first half of the time series, but under-estimate the precipitation after 1981. The residual plots ( Figure 5) show that errors in GPCC range between 6 and −20, while errors for CRU, APHRODITE, and UDel range between 40 and −40. The time series of observed and gridded precipitation as well as their differences at the hyper-arid station is shown in Figure 5. The precipitation at this station varies from 0 to 100 mm. About 68% of the data at this station is zero. The results show that the highest precipitation of 94 mm is recorded in 2005 at this station was well replicated by GPCC and UDel, while APHRODITE and CRU were found to underestimate this precipitation event. The GPCC precipitation was found to replicate the observed precipitation time series at this station more accurately compared to other products. The UDel was found to replicate the precipitation more accurately in the first half of the time series, but under-estimate the precipitation after 1981. The residual plots ( Figure 5) show that errors in GPCC range between 6 and −20, while errors for CRU, APHRODITE, and UDel range between 40 and −40. Generally, the regions with low precipitation are called upon to yield small differences between observed and estimated data. We also analyzed the residuals of the mean monthly observed precipitation against mean monthly gauge-based gridded data given in Figure 6. It was observed that differences are high in months where the precipitation is high and small where the precipitation amount is low. These differences vary in different gauge-based gridded datasets. For example, in the semi-arid station, the highest observed precipitation of 95.18 mm is recorded in the months of July and GPCC estimated 99.95 mm where the highest difference of −4.8 is also noted on the (Figure 6a) for GPCC. Similarly, the lowest precipitation of 3.86 mm is recorded in the month of November, and the lowest difference is also found in the same month. Overall, it can be viewed that GPCC showed the lowest differences while inconsistencies were observed in the highest and lowest mean residuals of other products. Generally, the regions with low precipitation are called upon to yield small differences between observed and estimated data. We also analyzed the residuals of the mean monthly observed precipitation against mean monthly gauge-based gridded data given in Figure 6. It was observed that differences are high in months where the precipitation is high and small where the precipitation amount is low. These differences vary in different gauge-based gridded datasets. For example, in the semi-arid station, the highest observed precipitation of 95.18 mm is recorded in the months of July and GPCC estimated 99.95 mm where the highest difference of −4.8 is also noted on the (Figure 6a) for GPCC. Similarly, the lowest precipitation of 3.86 mm is recorded in the month of November, and the lowest difference is also found in the same month. Overall, it can be viewed that GPCC showed the lowest differences while inconsistencies were observed in the highest and lowest mean residuals of other products. Water 2019, 11, x FOR PEER REVIEW 10 of 22    Figure 6b showed the mean monthly residuals in the arid station. It is clear on the figure that GPCC has yielded fewer residuals and can be seen near the zero line. The highest precipitation at this station is received in the month of July and all datasets showed the highest errors in that month. Among semi-arid and arid stations, the lowest errors in the hyper-arid station (Figure 6c) is also noted in GPCC dataset. It is noted that observed precipitation in the month of September is zero where GPCC and APHRODITE showed the lowest errors in the same month. On the other hand, the highest precipitation of 9.40 mm is received in the month of February, and the highest difference is also found in same from GPCC data.
Scatter plot is also useful to reveal errors in different parts of the data. No matter how data is displayed, it is used to better understand how well the data products have estimated the observed data. Figure 7 shows the scatter plots of observed against gauge-based gridded precipitation for different climatic zones. The plots for each station are arranged in the vertical direction for comparison among different datasets. The plots for different data products at the semi-arid station are given in Figure 7a. GPCC plot shows a narrower scatter, which shows a large part of GPCC data is near to the observed value. It can be seen that most of the data of CRU and APHRODITE is dispersed and located away from the 45-degree line. The data above 100 mm is mostly under-predicted by these two products. Overall, the performance of GPCC was found better by predicting low as well as extreme values as compared to other data products at this station.

Mean Bias Error (MBE)
Mean bias errors (MBE) in different gauge-based gridded precipitation data are presented in Figure 8. It was found that the biases of different gauge-based gridded precipitation data products vary according to climatic zones. However, it was observed that most of the months have yielded the lowest errors in GPCC. In the semi-arid station, biases vary from positive to negative in different  Figure 7b showed the scatter plot between observed and gauge-based gridded precipitation in the arid station. It can be observed that all data have performed better as compared to the semi-arid station. It can be seen that most of APHRODITE data is underpredicted. The performance of UDel at this station was found better as most of the data is located near the bisector line. The scatter plot of the hyper-arid station is shown in Figure 7c. It can be observed in the figure that GPCC exhibited a good accuracy for estimating observed data while CRU was failed to maintain accuracy in predicting low precipitation values. Most of the data in CRU are over-estimated while APHRODITE was found better as compared to CRU and UDel at this station.

Mean Bias Error (MBE)
Mean bias errors (MBE) in different gauge-based gridded precipitation data are presented in Figure 8. It was found that the biases of different gauge-based gridded precipitation data products vary according to climatic zones. However, it was observed that most of the months have yielded the lowest errors in GPCC. In the semi-arid station, biases vary from positive to negative in different months and in different data products except for APHRODITE where all the errors were found positive. The lowest error of GPCC and UDel varies from around −5 to 1 and −6 to 15, respectively. On the other hand; CRU and APHRODITE yielded higher errors that vary from around −6 to 36 and 2 to 40 respectively. It is also observed that there is no consistency in the lowest errors for different months of data products. For example in GPCC and CRU, the lowest error was found in November and in April. In APHRODITE and UDel, lowest errors were found in November and October. Comparisons among different data product showed that GPCC gave the lowest errors in all months except April, June, July, and October. The lowest error in April was found in CRU while in June, July, and October lowest errors were obtained in UDel. In the arid station (Figure 8b), it was observed that there is no consistency in the lowest errors in different datasets. Like, semi-arid and hyper-arid stations, biases vary from positive to negative in different months and different data products. It can also be noted that different datasets were found to have the lowest bias in various months. For example, GPCC data showed the lowest bias in the months of January, February, March, June, September, October, and November. CRU gave the lowest bias in April and May while UDel showed the lowest bias in May, August, and December respectively.
In hyper-arid station (Figure 8c), biases vary from positive to negative in different months of GPCC, CRU, and UDel products while APHRODITE was found to have negative biases in all the months. Precipitation amount at this station is very small, and it was expected and observed that biases at this station are also low. Unlike, semi-arid and arid stations data; there was a consistency in the lowest errors of different data products. The lowest error was found in September for all products since the observed data of September was zero. Overall, the performance of GPCC was found better at all stations in term of mean bias by giving lowest bias error in most of the months.

Mean Absolute Error (MAE)
Mean absolute errors for semi-arid, arid, and hyper-arid are given in Figure 9. The figure showed In the arid station (Figure 8b), it was observed that there is no consistency in the lowest errors in different datasets. Like, semi-arid and hyper-arid stations, biases vary from positive to negative in different months and different data products. It can also be noted that different datasets were found to have the lowest bias in various months. For example, GPCC data showed the lowest bias in the months of January, February, March, June, September, October, and November. CRU gave the lowest bias in April and May while UDel showed the lowest bias in May, August, and December respectively.
In hyper-arid station (Figure 8c), biases vary from positive to negative in different months of GPCC, CRU, and UDel products while APHRODITE was found to have negative biases in all the months. Precipitation amount at this station is very small, and it was expected and observed that biases at this station are also low. Unlike, semi-arid and arid stations data; there was a consistency in the lowest errors of different data products. The lowest error was found in September for all products since the observed data of September was zero. Overall, the performance of GPCC was found better at all stations in term of mean bias by giving lowest bias error in most of the months.

Mean Absolute Error (MAE)
Mean absolute errors for semi-arid, arid, and hyper-arid are given in Figure 9. The figure showed that GPCC dataset is closer with zero in all stations, indicating that errors are less in GPCC. In semi-arid station (Figure 8a), CRU showed highest errors followed by APHRODITE and UDel. It is also observed that the errors are high in the months where precipitation is high and less where precipitation is less. The highest errors can be seen in the months of July and August while the lowest can be seen in the months of October, November, and December for all products.  The results of the arid station are presented in Figure 9b. The errors vary from 0.1 to 12. Most of the errors are found in APHRODITE. Highest errors in APHRODITE can be seen in the month of July which is relatively high as compared to other products. The lowest errors can be seen in the GPCC where errors range from 0.1 to 2. The highest errors in GPCC were obtained in the month of March. The CRU and UDel have similar types of results and were found less than APHRODITE.
The results in the hyper-arid station in (Figure 9c), It can be noted that the errors of GPCC are less than 2 mm. Highest errors in GPCC can be seen in the month of March while the lowest can be seen in the months of July to December. The comparison of MAE at this station showed that CRU has highest followed by UDel and APHRODITE.

Modified Index of Agreement
The modified index of agreement between gauge-based gridded data with observed data is shown in Figure 10. The figure displays the clear superiority of GPCC datasets over others. It can be seen that in the semi-arid station (Figure 10a) highest index of the agreement is obtained between GPCC and observed datasets for all the months that range from 0.80 to 1.0. It can also be seen that CRU and APHRODITE have similar kind of agreements with observed data. Most of the months have very close values with each other; however, it is apparent that there is a slight difference in the months of March, April, and September. On the hand, UDel was found to have a better agreement as compared to APHRODITE and CRU. It can be observed that almost all the months of UDel have agreement values above 0.5 while APHRODITE and CRU have agreement values above 0.30.

Modified Index of Agreement
The modified index of agreement between gauge-based gridded data with observed data is shown in Figure 10. The figure displays the clear superiority of GPCC datasets over others. It can be seen that in the semi-arid station (Figure 10a) highest index of the agreement is obtained between GPCC and observed datasets for all the months that range from 0.80 to 1.0. It can also be seen that CRU and APHRODITE have similar kind of agreements with observed data. Most of the months have very close values with each other; however, it is apparent that there is a slight difference in the months of March, April, and September. On the hand, UDel was found to have a better agreement as compared to APHRODITE and CRU. It can be observed that almost all the months of UDel have agreement values above 0.5 while APHRODITE and CRU have agreement values above 0.30. The results of the arid station are shown in Figure 10b. The higher index of GPCC can be visualized on the figure. It can be seen from the figure that GPCC have agreement above 0.90 in all the months. A better performance of other products can also be seen. CRU has the good agreement of above 0.70 with observed data in most of the months except June and September where the agreement of 0.45 and 0.0 is obtained. In the APHRODITE dataset, most of the months have the agreement of less than 0.70 that is quite low as compared to other datasets. It can also be noted that UDel has very close values with one in some of the months that indicate the superiority of UDel over APHRODITE and CRU. Figure 10c also clearly indicates the distinct superiority of GPCC over other datasets in the hyper-arid station. It can be seen that most of the months have good agreement with observed data that falls in the range of 0.80 to 1.0 except the month of May, September and October have values less than 0.80. It can be found in the figure that CRU has zero agreement in the months of May, and September. In addition to this, other months also showed very minimal agreement, where most of the months have values less than 0.60. APHRODITE and UDel showed better agreement as compared to CRU, as most of the month of APHRODITE showed values higher than 0.60 and UDel gave higher values in all months. The results of the arid station are shown in Figure 10b. The higher index of GPCC can be visualized on the figure. It can be seen from the figure that GPCC have agreement above 0.90 in all the months. A better performance of other products can also be seen. CRU has the good agreement of above 0.70 with observed data in most of the months except June and September where the agreement of 0.45 and 0.0 is obtained. In the APHRODITE dataset, most of the months have the agreement of less than 0.70 that is quite low as compared to other datasets. It can also be noted that UDel has very close values with one in some of the months that indicate the superiority of UDel over APHRODITE and CRU. Figure 10c also clearly indicates the distinct superiority of GPCC over other datasets in the hyper-arid station. It can be seen that most of the months have good agreement with observed data that falls in the range of 0.80 to 1.0 except the month of May, September and October have values less than 0.80. It can be found in the figure that CRU has zero agreement in the months of May, and September. In addition to this, other months also showed very minimal agreement, where most of the months have values less than 0.60. APHRODITE and UDel showed better agreement as compared to CRU, as most of the month of APHRODITE showed values higher than 0.60 and UDel gave higher values in all months.

Test of Similarity in Data Distribution
The results of the similarity test using the Anderson-Darling test are shown in Table 2. Rejection level of 0.05 was used to verify the null hypothesis of similar distribution in observed and gauge-based gridded precipitation data. It was observed that distributions of GPCC and UDel precipitation were similar to observed distributions in nine months in the semi-arid station. In arid and hyper-arid stations, GPCC was also found to replicate similar to observed distributions in more than seven months. Therefore, the GPCC can be considered as the most reliable precipitation data in term of data distribution.

Multivariate Analysis
Multivariate diagrams were prepared to have concise information on the statistical summary of different gauge-based gridded data products. The diagrams for semi-arid, arid, and hyper-arid stations are shown in Figure 11. Different colors are used to represent different statistical parameters such as blue represents mean bias error, red indicates mean absolute error while green is used to identify the modified index of agreement. It can be seen from the figure that in the case of GPCC, mean bias error (blue) and mean absolute errors are closed to zero, modified index of agreement (green) was found close to one at all stations. This clearly indicates the overall superiority of GPCC over other datasets. Overall, it is clear from the figure that GPCC is superior followed by UDel and APHRODITE. However, CRU was found poor in most of the statistics and stations. such as blue represents mean bias error, red indicates mean absolute error while green is used to identify the modified index of agreement. It can be seen from the figure that in the case of GPCC, mean bias error (blue) and mean absolute errors are closed to zero, modified index of agreement (green) was found close to one at all stations. This clearly indicates the overall superiority of GPCC over other datasets. Overall, it is clear from the figure that GPCC is superior followed by UDel and APHRODITE. However, CRU was found poor in most of the statistics and stations.

Discussions
The comparison of the results obtained using various statistical tests showed the differences in performance of different gauge-based gridded precipitation data at different climatic zones. There is a number of factors that contribute to the better performance of gridded datasets, e.g., temporal domain, sources of data, the method of interpolation, etc. In addition to that, number, distribution,

Discussions
The comparison of the results obtained using various statistical tests showed the differences in performance of different gauge-based gridded precipitation data at different climatic zones. There is a number of factors that contribute to the better performance of gridded datasets, e.g., temporal domain, sources of data, the method of interpolation, etc. In addition to that, number, distribution, quality of stations, and topography of the area affect the capability of data products [72,73]. In the present study, GPCC data was found better compared to CRU, APHRODITE, and UDel in terms of all statistical tests conducted. Similar results are also reported by Duethmann et al. [74] in comparing a number of gauge-based gridded precipitation products using conventional statistical methods.
One of the major causes of better accuracy of GPCC data is the used of relatively more number of observed stations in the construction of this gauge-based gridded precipitation data. Schneider et al. [24] reported that GPCC databases collect monthly precipitation data from more than 85,000 stations around the world. The CRU was developed from more than 4000 weather stations distributed around the world [21]. On the other hand, the number of the valid station used for the constructing APHRODITE gauge-based gridded precipitation data is between 5000 and 12,000 [13], and for UDel between 4100 and 22,000 [23]. Therefore, we also assessed station numbers considered for the generation of gridded data over Balochistan by GPCC, CRU, and UDel. The number of the station used by APHRODITE was not available. It was observed that these data products have used the different number of gauged stations in different months and in different years. It is difficult to provide the number of station per month in all years under consideration. Thus, the number of stations in first, middle, and last year of data series are shown in Table 3. It can be seen that a number of the station are higher in GPCC followed by UDel and CRU. It is observed that numbers of stations used by GPCC are about double of CRU. The quality of raw data used for the construction of gauge-based gridded precipitation product is also very crucial. In many cases, the quality of long-term precipitation data is not good enough. Often, the data are found inhomogeneous, particularly for the stations located in developing countries. Therefore, each observed time series is generally included into gauge-based gridded precipitation database after checking its quality. However, the quality control processes are different for different gauge-based gridded precipitation products. For GPCC, the data has to pass through with successive automatic and visual checks. Since outliers and extreme values are common and cannot be discarded in the analysis, additional checks are also performed to verify the unusual data [24]. The rigorous quality control system has made the GPCC product more attractive and reliable. In CRU, all the data are passed through a two-stage extensive manual and semi-automatic quality control measures. In the first stage, the data is checked for the consistency and in the second stage, the stations or months are removed which gives a large error during the interpolation process [21,75]. In APHRODITE, brief screening based on the geographic location of the station is done, and the data found outside the national boundary are rejected for the further process [13]. On the other hand, the quality control of UDel was not well documented. Different procedures used for quality control have an effect on the quality of gauge-based gridded precipitation product. The reliability of GPCC products may be due to the most robust procedures used in data quality control.
Another major factor which contributes to the reliability of gauge-based gridded precipitation data is the method used for interpolation of data over grid points. The spatial interpolation methods that consider the orography is often advantageous over other methods in predicting precipitation in mountainous terrain [76]. The high variability in topography influences the precipitation pattern over the short distance. Therefore, most of the gauge-based gridded precipitation data are unable to capture accurate precipitation over rough and mountainous areas. However, GPCC uses SPHEREMAP, which is found to be robust as it considers elevation during interpolation, and consequently, the ability to enhance estimation accuracies [71]. Schneider et al. [24] claimed that GPCC data has the ability to better reproduce precipitation amount and patterns over rough terrains.
The APHRODITE dataset was constructed for the Asian countries. The current datasets are also available for Russia, Middle East, and Monsoon Asia. The monsoon rain enters Balochistan from the eastern side, and therefore, some parts of the study area receive more precipitation during monsoon. The comparison of observed annual precipitation and APHRODITE precipitation showed a wide difference. The observed mean annual precipitation in the semi-arid station of Balochistan is 387 mm, while APHRODITE produced a precipitation of 221 mm. The comparison of monthly precipitation also showed that the APHRODITE has underestimated precipitation during the months of monsoon. The area receives maximum precipitation of 97 mm in the month of July, but the APHRODITE estimated the precipitation only 57 mm. Similar, underestimated results were found in other months and stations. The obtained results also agree with the results of Ali et al. [77], who reported the underestimation of APHRODITE in humid and sub-humid parts of Pakistan. The underestimation of APHRODITE was also reported by other authors [44,74]. It is expected that the APHRODITE gauge-based gridded precipitation can give better results during the months of monsoon and at the station where monsoon winds have influence. However, the poor performance of APHRODITE in the study area may be due to the raw data collected from Pakistan. Yatagai, Arakawa, Kamiguchi, Kawamoto, Nodzu, and Hamada [13] reported that APHRODITE represents the missing data by zero that may lead to underestimation of precipitation and other statistical errors.
Several authors such as Pour et al. [78] reported that in general, extreme events frequently occur at the micro scale, therefore, gridded data may not have the capability to capture those phenomena precisely at point level. Schneider et al. [79] argued that extreme events in the gridded data at micro or small scale can only be captured precisely with a greater number of observed stations or dense network of stations over an area. The lack of a dense network of stations is one of the major causes of data scarcity in most of the regions around the world. Therefore, in spite of a number of drawbacks in gridded datasets such as reduction of peak rainfall and increase in wet days, gridded datasets are the only source to conduct hydro-climatic studies in the areas where observed data are not available.

Conclusions
The performances of four gauge-based gridded precipitation products, namely, GPCC, CRU, APHRODITE, and UDel, which are widely used as an alternative to observed precipitation, were evaluated in this paper. The performance of data was assessed by comparing time series graphs, residuals and scatter plots to understand the magnitudes, signs, and nature of the errors, as well as their temporal variability. Further mean bias error, mean absolute error, and modified index of agreement was used to assess the performance. Anderson-Darling test was also conducted for data distribution to compare the gauge-based gridded precipitation products in three different climatic stations, namely, semi-arid, arid, and hyper-arid. The results revealed a clear superiority of GPCC monthly precipitation product over other gauge-based gridded precipitation data products in the region. MBE and MAE values indicate the consistency of GPCC product in giving lowest errors in most of the months of a year at all the stations. Modified index of the agreement indicates a good association between observed and GPCC precipitation in all the months. In addition to this, GPCC datasets also replicated observed precipitation data distribution in all climatic stations. The results obtained through the present study can be further verified in other regions having different climate and topographic setup. It is expected that different statistical methods used in the present study can be used in other parts of the world to select better performing data products.
Author Contributions: K.A., S.S., and X.W. designed the research and wrote the manuscript. N.N. and N.K. critically reviewed the paper.