Evaluation of Gridded Precipitation Datasets in Malaysia

: This study compares ﬁve readily available gridded precipitation satellite products namely: Climate Hazards Group Infrared Precipitation with Station Data (CHIRPS) at 0.05 ◦ and 0.25 ◦ resolution, Tropical Rainfall Measuring Mission Multi-Satellite Precipitation Analysis (TMPA 3B42v7) and Princeton Global Forcings (PGFv3), both at 0.25 ◦ , and Global Satellite Mapping of Precipitation Reanalysis (GSMaP_RNL) at 0.1 ◦ , and evaluates their quality and reliability against 41 rain gauge stations in Malaysia. The evaluation was based on three numerical statistical scores (r, Root Mean Squared Error (RMSE) and Bias) and three categorical scores (Probability of Detection (POD), False Alarm Ratio (FAR) and Critical Success Index (CSI)) at temporal resolutions of daily, monthly and seasonal. The results showed that TMPA 3B42v7, PGFv3, CHIRPS25 and CHIRPS05 slightly overestimated the rain gauge data, while the GSMaP_RNL underestimated the value with the largest bias for monthly data. The CHIRPS25 showed the best POD score, while TMPA 3B42v7 scored highest for FAR and CSI. Overall, TMPA 3B42v7 was found to be the best-performing dataset, while PGFv3 registered the worst performance for both for numerical (monthly) and categorical (daily) scores. All products captured the intensity of heavy rainfall (20–50 mm / day) rather well, but tended to underestimate the intensity for categories of no or little rain (rain < 1 mm / day) and extremely heavy rain (rain > 50 mm / day). In addition, overestimation occurred for low moderate (2–5 mm / day) to low heavy rain and (10–20 mm / day). In the case study of the extreme ﬂooding event of 2006 / 2007 in the southern area of Peninsular Malaysia, TMPA 3B42v7 and GSMaP_RNL performed well in capturing most heavy rainfall events but tended to overestimate light rainfalls, consistent with their performance for the occurrence intensity of rainfall at di ﬀ erent intensity level.


Introduction
Generally, in most countries, especially the least developed and developing ones, the number of meteorological stations is limited and sparsely distributed. In mountainous and inaccessible areas, there is likely no meteorological station available. Hence, this can be a problem for various applications including validation of weather and climate simulations [1]. Most applications resort to gridded data derived from satellite products. Since the 1970s, due to their wide spatial and long temporal coverage, satellite-gridded precipitation datasets have been widely used in climate studies [2,3]. However, out in Adige Basin, Italy found that PGFv3 performance was less reliable compared to CHIRPS and TRMM [26].
This paper focuses on evaluating the quality and reliability of three gridded precipitation products that have yet to be validated over Malaysia namely CHIRPS, GSMaP and PGFv3 by comparing the gridded data to rain gauge observation data. For comparison, TMPA 3B42v7 will also be included. The evaluation analyses consider spatial as well as temporal variations and possible factors that may contribute to the performances of the products.

Study Area and Its Climatology
Malaysia lies approximately in the center of Southeast Asia, bounded from 1 • N to 8 • N and 99 • E to 120 • E. Malaysia consists of two parts: Peninsular Malaysia, which is part of the Asian landmass protruding southward and located between Thailand in the north and Singapore in the south, and East Malaysia, which is part of northern Borneo consisting of Sabah and Sarawak states ( Figure 1). Malaysia's climate is characterized by a hot and humid tropical climate with a pronounced rainfall seasonality modulated by the Asian-Australian monsoon [17]. The wet season occurs during the northeast monsoon (December to February, DJF) while the dry season prevails during the southwest monsoon (June to August, JJA). In between these two pronounced seasons are the inter-monsoon periods of March-April-May (MAM) and September-October-November (SON) [33]. Influenced by various factors (e.g., topography, synoptic circulations and the Inter-Tropical Convergence Zone (ITCZ) migration), the rainfall distribution among these seasons can have pronounced differences [34]. For example, the annual rainfall cycles of stations on the west coast of Peninsular Malaysia have two maxima during MAM and SON that coincide with inter-monsoon periods [34]. The minimum rainfall during the JJA period and lower rainfall during DJF are associated with the blocking effects of the Barisan mountain over Sumatra and the Titiwangsa mountain over Peninsular Malaysia, respectively. On the east coast of Peninsular Malaysia, rainfall peaks during DJF to coincide with the cold surges and Borneo vortex [35]. Due to multiple interactions of various factors (cold surges, Borneo vortex and Madden-Julian Oscillations), extreme precipitation events that can lead to major floods can occur during DJF on the east coast of Peninsular Malaysia [33]. One example was the flood event that occurred in southern Peninsular Malaysia during the 2006/2007 rainy season, with reported economic losses of around USD 500 million, 16 deaths and 200,000 victims [33]. Equally, the flood that occurred in December 2014 in northeast Peninsular Malaysia was caused by strong cold surges [36]. Over northern Borneo, the annual rainfall cycle is partly influenced by the tails of typhoons passing the Philippines. Over Sarawak and Sabah, far fewer meteorological stations are available compared to those of Peninsular Malaysia; hence, the use of rainfall-gridded products to validate climate models over this region is necessary.

Rain Gauge Dataset
In this study, observed daily rainfall datasets were obtained from rain gauge stations of the Malaysia Meteorological Department (MMD). A period from 2008 to 2012 was chosen because the number of stations with complete data was maximum. According to the World Meteorological Organization (WMO)'s Guide to Climatological Practices, the period of five years is considered sufficient to carry out this kind of study [37]. This constitutes a total of 41 rain gauge stations of which 28 stations were in Peninsular Malaysia and 13 were in Borneo ( Figure 1). The daily rainfall amount represents the 24 h accumulation for a period beginning at 0800 local time [38]. Peninsular Malaysia was sub-divided into five different sub-regions: northern Peninsular Malaysia (NPM), eastern Peninsular Malaysia (EPM), middle Peninsular Malaysia (MPM), western Peninsular Malaysia (WPM) and southern Peninsular Malaysia (SPM). Meanwhile, East Malaysia consists of two sub-regions: northern East Malaysia (NEM) and southern East Malaysia (SEM) (Figure 1) [12][13][14]. The latitude-longitude coordinates and

Gridded Rainfall Products
Five gridded products were evaluated in this study including CHIRPS at 0.25 • and 0.05 • , GSMaP_RNL at 0.1 • , PGF version 3 and TRMM Multi-Satellite Precipitation Analysis (TMPA) 3B42v7 at 0.25 • (Table 1). While the TMPA 3B42v7 has been evaluated previously and considered to be the best product [14], the other three products have yet to be analyzed. Hence, the performance of the three gridded products can be compared with TMPA 3B42v7. GSMaP is a purely satellite-based rainfall dataset derived from the thermal IR and microwave radiometer (TIR-MWR) to produce hourly data products that cover the globe from 60 • N to 60 • S at 0.1 • resolution [39]. The associated sensors include SSMI (Special Sensor Microwave/Imager) by the Defense Meteorological Satellite Program (DSMP), TRMM Microwave Imager (TMI) by TRMM, and Advanced Microwave Scanning Radiometer for EOS (ASMR-E) by Aqua. The IR sensor came from the CPC (Climate Prediction Center) of NOAA (Table 1) [40]. The development of GSMaP data was sponsored by the Core Research for Evolutional Science and Technology (CREST) of the Japan Science and Technology Agency (JST) from 2002-2007. From 2007, GSMaP activities are promoted by the JAXA Precipitation Measuring Mission (PMM) Science Team and the GSMaP products are distributed by the Earth Observation Research Center, Japan Aerospace Exploration Agency. GSMaP comprises of three variants: Rainfall Watch (GSMaP_RNL, GSMaP_MVK, GSMaP_NRT), GSMaP Realtime (GSMaP_NOW) and RIKEN Nowcast (GSMaP_RNC). The GSMaP_RNL uses the Japanese 55-year Reanalysis (JRA-55) data as ancillary data to produce a continuous and homogenous dataset. In this study, the GSMaP_RNL (Reanalysis Ver.), which has the same algorithm as GSMaP_MVK (moving vector with Kalman filter to estimate the rainfall rate) and with availability from March 2000 to February 2014 was used [40] (http://sharaku.eorc.jaxa.jp/GSMaP/index.htm).
PGF is a long-term dataset of meteorological forcings for the land surface hydrology modelling developed by Princeton University. The product is based on the merging of National Centers for Environmental Prediction-National Center for Atmospheric Research (NCEP-NCAR) reanalysis and globally available observation datasets. Although the original spatial resolution of PGF was 1.0 • with temporal resolution of 3-hourly, the product was further downscaled to 0.5 • and 0.25 • resolutions with temporal resolutions of daily and monthly for a period from January 1948 to December 2016. This product provides various meteorological variables including precipitation, air temperature at 2 m above ground, downward long/shortwave radiation at surface, surface pressure, specific humidity and wind speed. The precipitation product was constructed from NCEP reanalysis, TRMM real time data, GPCP, Climatic Research Unit (CRU) and observation-based precipitation datasets (Table 1) through a process of downscaling and bias correction in both spatial and temporal resolution [19,41] (http://hydrology.princeton.edu/data/pgf/). In this study, PGF version 3 of daily precipitation at 0.25 • resolution was used.
The TRMM product, introduced in 1997, was from a collaborative mission of United States National Aeronautics and Space Administration (NASA) and the Japan Aerospace Exploration Agency to study tropical and subtropical rainfall for weather and climate research [7,42]. It uses several instruments including Visible Infrared Radiometer (VIRS), TMI, Precipitation Radar (PR), Cloud and Earth Radiant Energy Sensor (CERES) and Lightning Imaging Sensor (LIS). The product datasets were available at 0.25 spatial resolutions and two temporal resolutions, i.e., 3-hourly and 7-day accumulated rainfall. The estimated precipitation is derived from two products, namely 3-hourly combined with microwave-IR estimation (with gauge adjustment) and monthly combined MWR-IR-gauge of about two months after the end of each month (satellite-gauge-gridded precipitation products). The onboard sensors include TRMM Combined Instrument algorithm (2B31) (TCI), TMI, SSMI, Special sensor microwave/Imager-Sounder (SSMIS), ASMR-E, Advanced Microwave Sounding Unit (AMSU) and Microwave Humidity Sounder (MHS) ( Table 1) [42]. Different variants of TRMM products of various time resolutions can be found at https://pmm.nasa.gov/data-access/downloads/trmm. In this study, we used the daily precipitation totals derived from 3B42 Research Version using the Version 7 TRMM Multi-Satellite Precipitation Analysis released in June 2012. TMPA 3B42v7 provides precipitation data from January 1998 until present.

Statistical Measures for Product Validation
For evaluation of performance of gridded precipitation datasets against the observed rain gauge dataset, we used pairwise comparison numerical statistics and categorical performance [43]. According to Tan et al. [44], the point-to-pixel and pixel-to-pixel comparison essentially produce similar results. Hence, the technique of point-to-pixel comparison was chosen due to insufficient station data for interpolation into gridded data [24,44]. In this technique, the values of gridded data were compared to the values of station data. The use of pairwise comparison based on three numerical statistical indicators, summarized in Table 2, allows the estimation of accuracy and reliability of gridded products [45]. The Pearson Correlation Coefficient (r) measures the degree of association between the gridded dataset and the rain gauge data. The Root Mean Squared Error (RMSE) calculates the differences between two datasets and provides the average magnitude of error. The bias measures the tendency of the gridded data to overestimate (Bias > 0) or underestimate (Bias < 0) the rain gauge data. Table 2. Statistical measures (G i , rain gauge measurement;Ḡ i , mean rain gauge measurement; S i , gridded rainfall estimate; S i , mean gridded rainfall estimate; and n, number of data pairs).

Name
Formula Perfect Score Relative Bias, Bias In addition, three categorical measures were used to detect rainfall occurrence and non-occurrence between satellite data and rain gauge data (Table 3) [16]. These measures are based on a contingency table where A is defined as hits (i.e., both gridded and station observations occur), B represents a false alarm (i.e., station does not occur, gridded occurs), C means misses (i.e., station occurs but gridded does not occur), D represents a correct negative (i.e., both station and gridded do not occur) and n is the sum of A, B, C and D. The rain occurrence threshold was set at 1 mm per day to detect a rainy or not rainy day. Table 3. Categorical performance measures for comparing station and gridded precipitation data.

Name
Formula Perfect Score

Probability of detection, POD
The Probability of Detection (POD) calculates the rain occurrence in gridded datasets by ignoring false alarms. On the other hand, False Alarm Ratio (FAR) shows the sensitivity towards the event of rain occurrence in satellite data that does not occur in station data without calculating the rain-missed event. Meanwhile, the Critical Success Index (CSI) shows the correct estimation of the rain occurrence while removing the negative occurrence. The values of POD, FAR, and CSI range from 0 to 1. A POD score of 1 implies perfect agreement of rainfall occurrences in both station and gridded data, whereas a score of zero corresponds to a complete disagreement between the two datasets. On the other hand, FAR measures the false alarm rates where a score of zero indicates no false alarm occurred. Meanwhile, the CSI index provides the measure of a critical success rate that blends POD and FAR, whereas a perfect score of 1 implies zero occurrences in both the false alarm and misses categories. The frequencies of defined rain intensities were also compared between rain gauge data and satellite Remote Sens. 2020, 12, 613 7 of 22 products. Thresholds for different categories of daily rainfall intensity follow World Meteorological Organization (WMO) [46] definitions: rain <1 mm (no/very light rain), 1-2 mm (light rain), 2-5 mm (low moderate rain), 5-10 mm (high moderate rain), 10-20 mm (low heavy rain), 20-50 mm (high heavy rain) and rain >50 mm (violent rain) [14,45,47].

Extreme Precipitation Events
To evaluate the appropriateness of the gridded data precipitation during extreme precipitation events, three events were  false alarm occurred. Meanwhile, the CSI index provides the measure of a critical success rate that blends POD and FAR, whereas a perfect score of 1 implies zero occurrences in both the false alarm and misses categories. The frequencies of defined rain intensities were also compared between rain gauge data and satellite products. Thresholds for different categories of daily rainfall intensity follow World Meteorological Organization (WMO) [46] definitions: rain <1 mm (no/very light rain), 1-2 mm (light rain), 2-5 mm (low moderate rain), 5-10 mm (high moderate rain), 10-20 mm (low heavy rain), 20-50 mm (high heavy rain) and rain >50 mm (violent rain) [14,45,47].

Extreme Precipitation Events
To evaluate the appropriateness of the gridded data precipitation during extreme precipitation events, three events were    The statistical performance indicators for monthly data are summarized in Table 4. TMPA 342v7 slightly overestimated the station data with the smallest positive bias of 0.02, followed by PGFv3 (0.03), Remote Sens. 2020, 12, 613 8 of 22 CHIRPS05 (0.06) and CHIRPS25 (0.08). The GSMaP_RNL was the only gridded precipitation dataset that underestimated the rain gauge with the highest bias magnitude of −0.11. TMPA 3B42v7 also recorded the lowest RMSE, whereas PGFv3 presented the highest RMSE value. Hence, TMPA 3B42v7 performed best for monthly total precipitation followed by the CHIRPS product. The CHIRPS05 performed slightly better than CHIRPS25, in agreement with a study in China [23], but differed from that in Adige Basin in Italy [26].  Figure 3 presents the monthly statistical performance indicators for all products based on seven sub-regions in Malaysia. In NPM, all products show monthly variation with highest values in March, June and July. In EPM, TMPA 3B42v7 and both CHIRPS products perform best during the wet season while performances decrease during dry season. Contrary to other products, the PGFv3 shows a negative correlation and high RMSE values during October, November and December, while GSMaP_RNL shows relatively high correlations, but underestimated the station rainfall by more than 50% in February. The MPM sub-region, which consists of two highlands, shows the highest statistical scores during February, March, April and October, but has low values during September and November. In WPM regions, the gridded precipitation datasets perform best during May-July and in October. Overall, GSMaP_RNL appears to perform the lowest compared to the other products. In the SPM, the gridded products perform the best during wet season. Meanwhile, in NEM, throughout the year, all gridded precipitation products except for GSMaP_RNL display relatively good statistical performance. Over the southern part of East Malaysia, gridded precipitation datasets recorded a better performance measure in January and May-September. Overall, in East Malaysia, the PGFv3 performs comparably with TMPA 3B42v7, while GSMaP_RNL is considered the worst. Table 5 compares the performance of five gridded products based on a seasonal basis for the whole of Malaysia. Overall, the performance of gridded precipitation appears higher during DJF (wet season) followed by JJA (dry season) and inter-monsoon periods, consistent with that found by Tan and Santo [15]. TMPA 3B42v7 performs the best among the other products for all seasons.   Table 5 compares the performance of five gridded products based on a seasonal basis for the whole of Malaysia. Overall, the performance of gridded precipitation appears higher during DJF (wet season) followed by JJA (dry season) and inter-monsoon periods, consistent with that found by Tan and Santo [15]. TMPA 3B42v7 performs the best among the other products for all seasons.  The performance of all gridded products according to each sub-region is shown in Figure 4. Over NPM and SPM, the r and RMSE values of these products are comparable for all products, although the biases can be different. Generally, most performances of all products tend to be lower in the MPM region. In EPM, most products perform well, except PGFv3 which has a high RMSE and underestimated precipitation. However, PGFv3 performs comparably with other products in East Malaysia. The GSMaP_RNL has the lowest performance at WPM and both locations in East Malaysia. Overall, TMPA 3B42v7 performs best in most sub-regions, followed by CHIRPS of both resolutions. Table 6 depicts the categorical performances of daily precipitation estimates of all gridded data products. Overall, TMPA 3B42v7 recorded its best scores in FAR and CSI, followed by GSMaP_RNL. On the other hand, CHIRPS25 recorded the highest score in POD, but has the lowest score in FAR. Despite having high resolution CHIRPS05 has lower POD compared to CHIRPS25. However, CHIRPS05 recorded lower FAR when compared to CHIRP25, which is consistent with the statistical score in Table 4. Meanwhile, PGFv3 recorded the worst score in almost all categorical performances. although the biases can be different. Generally, most performances of all products tend to be lower in the MPM region. In EPM, most products perform well, except PGFv3 which has a high RMSE and underestimated precipitation. However, PGFv3 performs comparably with other products in East Malaysia. The GSMaP_RNL has the lowest performance at WPM and both locations in East Malaysia. Overall, TMPA 3B42v7 performs best in most sub-regions, followed by CHIRPS of both resolutions.  Table 6 depicts the categorical performances of daily precipitation estimates of all gridded data products. Overall, TMPA 3B42v7 recorded its best scores in FAR and CSI, followed by GSMaP_RNL. On the other hand, CHIRPS25 recorded the highest score in POD, but has the lowest score in FAR. Despite having high resolution CHIRPS05 has lower POD compared to CHIRPS25. However, CHIRPS05 recorded lower FAR when compared to CHIRP25, which is consistent with the statistical score in Table 4. Meanwhile, PGFv3 recorded the worst score in almost all categorical performances.    Figure 5 shows the categorical scores for each month in all sub-regions. Consistent with Varikoden et al. [12] and Tan et al. [14], generally, all sub-regions in Peninsular Malaysia and East Malaysia tended to record the best (worst) values of POD, FAR and CSI scores coinciding with maximum (minimum) rainfall. Hence, all scores tend to have strong seasonality. PGFv3 product shows consistently lowest POD and highest FAR, indicating its inferiority. The CSI values also tended to be lower. However, despite identified as the most reliable product in previous analysis, TMPA 3B42v7 does not show the highest POD. It is CHIRPS25 that tended to produce highest POD values. However, in both FAR and CSI, TMPA 3B42v7 shows its superiority. GSMaP_RNL's scores of POD, FAR and CSI indicate its performances to be somewhere in between TMPA 3B42v7 and CHIRPS.

Categorical Measurement Ability
The overall categorical performance scores for all products are shown in Figure 6. Consistent with the monthly values, TMPA 3B42v7 and GSMaP_RNL show better performances in FAR and CSI, while CHIRPS25 recorded slightly good performance in POD. However, both CHIRPS05 and CHIRPS25 tended to have much higher values in FAR compared to TMPA 3B42v7 and GSMaP_RNL. Hence, these scores suggest that TMPA 3B42v7 is the overall best performing product followed by GSMaP_RNL and CHIRSPS. The PGFv3, on the other hand, appears to have the lowest scores in almost all categories in most sub-regions, and hence can be considered the worst performing product. Furthermore, the POD and CSI (FAR) scores of all products tended to be the highest (lowest) in SEM compared with other sub-regions. Hence, these scores indicated that the products approximated the stations data much better in SEM.

PGFv3
0.72 0.45 0.45 TMPA 3B42v7 0.80 0.31 0.59 Figure 5 shows the categorical scores for each month in all sub-regions. Consistent with Varikoden et al. [12] and Tan et al. [14], generally, all sub-regions in Peninsular Malaysia and East Malaysia tended to record the best (worst) values of POD, FAR and CSI scores coinciding with maximum (minimum) rainfall. Hence, all scores tend to have strong seasonality. PGFv3 product shows consistently lowest POD and highest FAR, indicating its inferiority. The CSI values also tended to be lower. However, despite identified as the most reliable product in previous analysis, TMPA 3B42v7 does not show the highest POD. It is CHIRPS25 that tended to produce highest POD values. However, in both FAR and CSI, TMPA 3B42v7 shows its superiority. GSMaP_RNL's scores of POD, FAR and CSI indicate its performances to be somewhere in between TMPA 3B42v7 and CHIRPS. The overall categorical performance scores for all products are shown in Figure 6. Consistent with the monthly values, TMPA 3B42v7 and GSMaP_RNL show better performances in FAR and CSI, while CHIRPS25 recorded slightly good performance in POD. However, both CHIRPS05 and CHIRPS25 tended to have much higher values in FAR compared to TMPA 3B42v7 and GSMaP_RNL. Hence, these scores suggest that TMPA 3B42v7 is the overall best performing product followed by GSMaP_RNL and CHIRSPS. The PGFv3, on the other hand, appears to have the lowest scores in almost all categories in most sub-regions, and hence can be considered the worst performing product. Furthermore, the POD and CSI (FAR) scores of all products tended to be

Evaluation of Frequency on Different Rain Intensity
The frequency (or occurrence percentage) according to rainfall intensity in Malaysia is shown in Figure 7. The TMPA3B42v7 and GSMaP_RNL show small differences throughout all rain intensity categories. However, based on comparison with station data, the performance of all gridded products was considered inferior, with a tendency for underestimation in the no rain category (<1 mm/day), especially CHIRPS25 which underestimated it by as much as 22%. For light rain (1-2 mm/day), the differences are small except for CHIRPS05 and PGFv3, which underestimated the intensity by more than 5%, while in low moderate rain (2-5 mm/day), PGFv3 underestimated the rainfall frequency by more than 5%. From high moderate rain (5-10 mm/day) to low heavy rain (10-20 mm/day), all products tended to overestimate the frequency, especially CHIRPS05, CHIRPS25 and PGFv3. Noticeably, the PGFv3 shows different performances in different rainfall categories where it tended to underestimate the frequency of low moderate rain (5.24%), but overestimated the moderate rain (9.56%) and low heavy rain (20.76%). Interestingly, for high heavy rain (20-50 mm/day), all products appear to have reasonably estimated the frequency correctly. Meanwhile, for the frequency of rainfall above 50 mm per day, all products tended to underestimate, although the differences can be minimal.
Remote Sens. 2020, 12, 613 12 of 23 the highest (lowest) in SEM compared with other sub-regions. Hence, these scores indicated that the products approximated the stations data much better in SEM.

Evaluation of Frequency on Different Rain Intensity
The frequency (or occurrence percentage) according to rainfall intensity in Malaysia is shown in Figure 7. The TMPA3B42v7 and GSMaP_RNL show small differences throughout all rain intensity categories. However, based on comparison with station data, the performance of all gridded products was considered inferior, with a tendency for underestimation in the no rain category (<1 mm/day), especially CHIRPS25 which underestimated it by as much as 22%. For light rain (1-2 mm/day), the differences are small except for CHIRPS05 and PGFv3, which underestimated the intensity by more than 5%, while in low moderate rain (2-5 mm/day), PGFv3 underestimated the rainfall frequency by more than 5%. From high moderate rain (5-10 mm/day) to low heavy rain (10-20 mm/day), all products tended to overestimate the frequency, especially CHIRPS05, CHIRPS25 and PGFv3. Noticeably, the PGFv3 shows different performances in different rainfall categories where it tended to underestimate the frequency of low moderate rain (5.24%), but overestimated the moderate rain (9.56%) and low heavy rain (20.76%). Interestingly, for high heavy rain (20-50 mm/day), all products appear to have reasonably estimated the frequency correctly. Meanwhile, for the frequency of rainfall above 50 mm per day, all products tended to underestimate, although the differences can be minimal.

Evaluation During the 2006/2007 Flood Event
The 2006/2007 flood event in the southern part of Peninsular Malaysia produced three episodes of extreme precipitation events [33]. The stations affected by the heavy rainfall events included Senai, Kluang, Batu Pahat, Mersing, Muadzam Shah and Kuantan, mostly situated in SPM and also in MPM and EPM (indicated with blue triangles in Figure 1). The time series of daily precipitation of affected areas for all gridded products are presented together with station data in Figure 8. Both TMPA 3B42v7 and GSMaP_RNL showed good agreement with rain gauge data in Senai, Kluang and Mersing, but indicated weaknesses in other stations for overestimating light rain and unable to completely capture all episodes of extreme rainfall. Generally, CHIRPS and PGFv3 can be considered having poor performances in detecting extreme events since their maximum intensity of heavy rainfall were lower compared to observations. The performance measures of gridded products during the three extreme events for the six rain gauge stations are given in Table 7. The ability of gridded products to detect extreme events varies according to stations. In general, the correction values are high, but both RMSE and the bias are higher as well. Generally, both TMPA 3B42v7 and GSMaP_RNL have good scores in some stations, while other products tended to underestimate, consistent with the findings of Soo et al. [16].

Discussion
This analysis explored the reliability and quality of five readily available gridded precipitation datasets compared to observed rain gauge data. Consistent with Tan et al. [14,15] the result indicated that the TMPA 3B42v7 is the overall best performing gridded precipitation dataset in Malaysia followed by CHIRPS, GSMaP_RNL and PGFv3. These relative performances could be due various factors related to input data, onboard sensors, algorithm, and interpolation techniques (Table 1). However, it is difficult to attribute and quantify the effects of each factor. The blending of rain gauge data does not guarantee the highest quality of the products. For examples, both TMPA 3B42v7 and CHIRPS incorporated rain gauges (although number of stations used may be different) but the former has been consistently proven to be the best gridded precipitation data in this study and previous investigations [12][13][14]16]. Hence, other factors such as onboard sensors and algorithm could be more important. However, the performance could vary spatially due the characteristics of the sub-regions, especially in terms of elevation and topography [1,48]. In East Malaysia, PGFv3 appears to be a better product than GSMaP_RNL.
The results of the analyses could be least influenced by the number of rain gauges in each sub-region. In particular, with roughly equal number of stations: NPM-6 stations, EPM-6 stations, MPM-5 stations, WPM-6 stations, SPM-5 stations, NEM-5 stations and SEM-8 stations, the total number of stations may not be the dominant factor in determining the performance of gridded products. However, the MPM is the only sub-region with two rain gauges located in mountainous locations: K. Tanah Rata and Cameron Highlands (indicated with green triangles in Figure 1). The results showed that the gridded product performed the worst in MPM, possibly due to the existence of these two mountainous rain gauges. Figure 9 depicts the scatter plot of five locations in MPM for TMPA 3B42v7, which the two stations in the mountainous region showed lowest correlation. This relatively poor performance could be due the shortcoming in the satellite algorithm when applied in the highly complex terrain of mountainous areas [49]. In addition, in mountainous regions, radar signals that hit the surface could return false echoes [50].
Higher spatial-temporal gridded precipitation data was suggested to be of better quality compared to the lower-resolution product [4]. However, in this study, the higher resolution of the CHIRPS05 product does not improve much over TMPA 3B42v7. This could be a disadvantage of using the CCD technique which may classify some clouds from satellite signals as cold clouds, when in fact most of them are high clouds with no rainfall activity [51]. In addition, perhaps remote sensing algorithms used in gridded precipitation datasets play an important role in its performance.
The accuracy of gridded precipitation datasets is also influenced by the seasonality of rainfall in each sub-region ( Table 5). The ability of gridded precipitation datasets in approximating the observed rain gauge data is better during the wet season than the dry season, probably due to the capability of the satellite sensor to detect a convective precipitation system [15]. However, the PGFv3 does not seem to have the same characteristic in EPM for the period from October through December where the rainfall is considered high (Figure 3). The scatter plot in Figure 10 shows that PGFv3 was unable to correlate with station data in November, unlike the TMPA 3B42v7 product in Figure 11. This could be contributed by the downscaling application of reanalysis of global datasets [52].
MPM-5 stations, WPM-6 stations, SPM-5 stations, NEM-5 stations and SEM-8 stations, the total number of stations may not be the dominant factor in determining the performance of gridded products. However, the MPM is the only sub-region with two rain gauges located in mountainous locations: K. Tanah Rata and Cameron Highlands (indicated with green triangles in Figure 1). The results showed that the gridded product performed the worst in MPM, possibly due to the existence of these two mountainous rain gauges. Figure 9 depicts the scatter plot of five locations in MPM for TMPA 3B42v7, which the two stations in the mountainous region showed lowest correlation. This relatively poor performance could be due the shortcoming in the satellite algorithm when applied in the highly complex terrain of mountainous areas [49]. In addition, in mountainous regions, radar signals that hit the surface could return false echoes [50].  Interpolation techniques within each product may also influence the quality. For example, some products may have better characteristics of precipitation gradient at the edge of the mountains compared with others. Figure 12 shows the spatial maps of annual rainfall of all products. It appears that the spatial rainfall distribution of TMPA 3B42v7 correlates well with topography with indication of precipitation gradient, both in Peninsular Malaysia and East Malaysia. While similar features are also indicated in CHIRPS and GSMaP_RNL in Peninsular Malaysia, much weak gradient can be seen in PGFv3. Nevertheless, the annual rainfall of GSMaP_RL in WPM and MPM is lower compared with TMPA 3B42v7. In SEM of East Malaysia, the precipitation gradient of GSMaP_RNL appears to be reversed to that of TMPA 3B42v7. In fact, the area of maximum rainfall of TMPA 3B42v7 in SEM was not replicated in GSMaP_RNL. In addition, similar to WPM and MPM, GSMaP_RNL underestimated TMPA 3B42v7 rainfall in northeastern SEM and southern NEM. These differences may be caused by the input data in the products. Using TMPA 3B42v7 as a reference dataset, a Taylor Diagram can be used to provide measures of similarity (in terms of both spatial RMSE and correlation values) among other products (Figure 13). In terms of correlations, both CHRIPS products have the highest values (0.9) followed by PGFv3 (0.84) and GSMaP (0.73). However, based on RMSE, GSMaP_RNL shows the smallest value although GSMaP_RNL tended to underestimate the rain gauge data (Table 4) as well as TMPA 3B42v7.
capability of the satellite sensor to detect a convective precipitation system [15]. However, the PGFv3 does not seem to have the same characteristic in EPM for the period from October through December where the rainfall is considered high (Figure 3). The scatter plot in Figure 10 shows that PGFv3 was unable to correlate with station data in November, unlike the TMPA 3B42v7 product in Figure 11. This could be contributed by the downscaling application of reanalysis of global datasets [52].  Interpolation techniques within each product may also influence the quality. For example, some products may have better characteristics of precipitation gradient at the edge of the mountains compared with others. Figure 12 shows the spatial maps of annual rainfall of all products. It appears that the spatial rainfall distribution of TMPA 3B42v7 correlates well with

Conclusions
In this study, evaluation of the performance and reliability of five gridded datasets, namely CHIRPS05, CHIRPS25, PGFv3, GSMaP_RNL and TMPA 3B42v7, in approximating rain gauge  Taylor diagram of all gridded rainfall products using TMPA 3B42v7 as a reference dataset.

Conclusions
In this study, evaluation of the performance and reliability of five gridded datasets, namely CHIRPS05, CHIRPS25, PGFv3, GSMaP_RNL and TMPA 3B42v7, in approximating rain gauge Figure 13. Taylor diagram of all gridded rainfall products using TMPA 3B42v7 as a reference dataset.

Conclusions
In this study, evaluation of the performance and reliability of five gridded datasets, namely CHIRPS05, CHIRPS25, PGFv3, GSMaP_RNL and TMPA 3B42v7, in approximating rain gauge station observations were conducted. The comparison was assessed for categorical score on daily data and statistical measures for monthly and seasonal timescales from 2008 to 2012. The findings of this study can be summarized as follows: 1.
Overall, TMPA 3B42v7 was considered the best product for daily, monthly and seasonal precipitation comparison with rain gauge data based on both the categorical and statistical scores. CHIRPS05 performed slightly better than CHIRPS25 in monthly data, while CHIRPS25 was better at daily precipitation. The GSMaP_RNL had slightly better daily performance compared to TMPA 3B42v7, despite its tendency to underestimate monthly rainfall. Meanwhile, the PGFv3 had the lowest performance in most temporal assessment.

2.
Overall, the wet season (DJF) had higher correlations than the dry season (JJA). However, the dry season recorded lower RMSE and has a better bias than the wet season. Therefore, gridded precipitation performs better over sub-regions that receive higher annual precipitation such as NPM and EPM, as well as East Malaysia. Meanwhile, gridded products perform the worst at MPM due to its highly complex terrain. The TMPA 3B42v7 performed consistently well in all regions. The PGFv3 performed rather poorly over EPM but showed high correlation in East Malaysia regions. Meanwhile, GSMaP_RNL performed worst with high underestimation in most regions.

3.
For rain detection ability, CHIRPS25 recorded highest Probability of Detection (POD) value followed by TMPA 3B42v7 that recorded low False Alarm Ratio (FAR) and high Critical Success Index (CSI). Meanwhile, GSMaP_RNL showed reasonably good FAR and CSI scores behind TMPA 3B42v7 as well as fair score in POD. PGFv3 showed the poorest score in rain detection ability.

4.
Most of the gridded precipitation datasets had a tendency to underestimate no rain (< 1mm/day). For light rain (1-2 mm/day), different gridded datasets show different frequencies with CHIRPS05, and PGFv3 had more than 5% underestimation. In general, all products tended to overestimate from low moderate rain to low heavy rain (2-5 mm/day; 20-50 mm/day). However, all products performed reasonably well for high heavy rain (20-50 mm/day) and slight underestimation for rain above 50 mm/day. Interestingly, the PGFv3 recorded underestimation by more than 5% for 1-2 mm/day and 2-5 mm/day, while having overestimation by 9.5% for 5-10 mm/day and 21% for 10-20 mm/day rainfall categories. Overall, TMPA 3B42v7 and GSMaP_RNL were able to detect extreme events, while other products tended to underestimate.

5.
Based on overall performance, TMPA 3B42v7 can be considered as the most appropriate gridded product for climate and meteorological study in Malaysia.