Assessment of three long-term gridded climate products for hydro-climatic simulations in tropical river basins

: Gridded climate products (GCPs) provide a potential source for representing weather in remote, poor quality or short-term observation regions. The accuracy of three long-term GCPs (Asian Precipitation—Highly-Resolved Observational Data Integration towards Evaluation of Water Resources: APHRODITE, Precipitation Estimation from Remotely Sensed Information using Artiﬁcial Neural Network-Climate Data Record: PERSIANN-CDR and National Centers for Environmental Prediction Climate Forecast System Reanalysis: NCEP-CFSR) was analyzed for the Kelantan River Basin (KRB) and Johor River Basin (JRB) in Malaysia from 1983 to 2007. Then, these GCPs were used as inputs into calibrated Soil and Water Assessment Tool (SWAT) models, to assess their capability in simulating streamﬂow. The results show that the APHRODITE data performed the best in precipitation estimation, followed by the PERSIANN-CDR and NCEP-CFSR datasets. The NCEP-CFSR daily maximum temperature data exhibited a better correlation than the minimum temperature data. For streamﬂow simulations, the APHRODITE data resulted in strong results for both basins, while the NCEP-CFSR data showed unsatisfactory performance. In contrast, the PERSIANN-CDR data showed acceptable representation of observed streamﬂow in the KRB, but failed to track the JRB observed streamﬂow. The combination of the APHRODITE precipitation and NCEP-CFSR temperature data resulted in accurate streamﬂow simulations. The APHRODITE and PERSIANN-CDR data often underestimated the extreme precipitation and streamﬂow, while the NCEP-CFSR data produced dramatic overestimations. Therefore, a direct application of NCEP-CFSR data should be avoided in this region. We recommend the use of APHRODITE precipitation and NCEP-CFSR temperature data in modeling of Malaysian water resources.


Introduction
Precipitation is a major component of the water cycle and is also a key input to hydrological and ecohydrological models.Meanwhile, the water cycle is largely influenced by changes in regional temperature [1].Therefore, long-term precipitation and temperature information are vital to study climate changes, forecast local precipitation variability and extreme events trend analysis.Despite this, acquisition of reliable precipitation and temperature data is still a challenging task, especially Tool (SWAT) ecohydrological model [32][33][34][35][36]; and (3) to analyze the suitability of the three GCPs for capturing extreme hydro-climatic events.

Study Area
Two tropical basins, the Kelantan River Basin (KRB) and Johor River Basin (JRB), were selected as study areas in this study due to differences in size, land use, topography and data availability (Figure 1).The KRB (4 • N~6 • N, 101 • E~103 • E) drains an area of 12,134 km 2 in northeastern Peninsular Malaysia.The main channel of the Kelantan River extends a total distance of about 248 km, and flows northward into the South China Sea.In 1990, the primary land use/land cover in the KRB was tropical forest (84.9%), followed by rubber (9.9%), oil palm (4.5%), urban (0.5%) and paddy (0.2%).The basin elevation ranges from 8 m a.s.l in the western region to 2174 m a.s.l in the southwestern regions.The KRB is characterized by a tropical monsoon climate, with an average annual precipitation ≥2500 mm, most of which falls from November to January [37].The average annual temperature of the basin is about 27.5 • C. The KRB is frequently affected by monsoon flood events during the northeast monsoon season.

Study Area
Two tropical basins, the Kelantan River Basin (KRB) and Johor River Basin (JRB), were selected as study areas in this study due to differences in size, land use, topography and data availability (Figure 1).The KRB (4° N~6° N, 101° E~103° E) drains an area of 12,134 km 2 in northeastern Peninsular Malaysia.The main channel of the Kelantan River extends a total distance of about 248 km, and flows northward into the South China Sea.In 1990, the primary land use/land cover in the KRB was tropical forest (84.9%), followed by rubber (9.9%), oil palm (4.5%), urban (0.5%) and paddy (0.2%).The basin elevation ranges from 8 m a.s.l in the western region to 2174 m a.s.l in the southwestern regions.The KRB is characterized by a tropical monsoon climate, with an average annual precipitation ≥2500 mm, most of which falls from November to January [37].The average annual temperature of the basin is about 27.5 °C.The KRB is frequently affected by monsoon flood events during the northeast monsoon season.The JRB (1 • N~3 • N, 103 • E~104 • E) drains an area of 1652 km 2 in southern Peninsular Malaysia (Figure 1b).The main river stem of the Johor River flows approximately 123 km southeast to the Strait of Johor.Elevations within the JRB range between 3 m a.s.l. and 977 m a.s.l, the highest elevations being located in the northern and western regions of the basin.The JRB is an agricultural production region, which is dominated by oil palm (38.4%), forest (44.1%) and rubber (15.3%) in 1990.The average annual precipitation and average annual temperature of the basin are 2500 mm and 26 • C, respectively.The Johor River is an important freshwater resource for the Johor and Singapore population, so any changes in water resources could lead to major impacts on agriculture, industrial and living conditions in both regions.For example, continuous hot weather in April 2016 resulted in water levels in the Linggiu Reservoir, located in the northern JRB falling to a new historic low.

Gridded Climate Products
Long-term GCPs are viable datasets that can be used for supporting the development of climate change and mitigation strategies for both the KRB and JRB.The evaluation of GCPs for this study focused on products characterized by long-term temporal climate datasets that contain data from at least a 30-year period.Based on this criterion, the APHRODITE, PERSIANN-CDR and NCEP-CFSR GCPs (Table 1) were assessed for the two study basins.Tan et al. [12] also reported that two Tropical Rainfall Measuring Mission (TRMM) 3B42 products performed well in replicating precipitation data for different sub-regions of Malaysia.However, the TRMM data were excluded from this study because the temporal resolution only extends back to 1998.APHRODITE is a long-term daily precipitation product that spans the 57-year period of 1951 to 2007, which was generated from thousands of gauge observations data collected from various countries' government agencies [7].It was developed by the Research Institute for Humanity and the Meteorological Research Institute of the Japan Meteorological Agency.APHRODITE is divided into Middle East, Russia, Monsoon Asia and Japan regions.In this study, APHRODITE V1101 (Monsoon Asia) with a 0.25 • resolution was used.
PERSIANN-CDR provides daily precipitation information from 1983 to the present for latitudes 60 • S-60 • N at a spatial resolution of 0.25 • .PERSIANN-CDR was established from the PERSIANN algorithm using Gridded Satellite Infrared Data (GridSat-B1), a calibrated and mapped geostationary satellite dataset [38].The training of the artificial neural network is done using the NCEP stage IV hourly precipitation data.The product is then adjusted by the Global Precipitation Climatology Project (GPCP) monthly version 2.2 product [8].
NCEP-CFSR was constructed for a period of 36 years (1979 to 2014) at ~0.31 • (38 km) resolution [6].NCEP-CFSR is produced using cutting-edge data assimilation techniques and a forecast model that extrapolates non-observed parameters from observed data, collected from various sources such as rain gauges, ships, weather balloons and satellites.NCEP-CFSR data were obtained for the whole of Peninsular Malaysia (latitude 0.      E), and then the stations distributed over each basin were used.There are five climate parameters: temperature, precipitation, wind speed, relative humidity and solar radiation.However, the analysis conducted here was limited to just the NCEP-CFSR precipitation and temperature data, in order to maintain consistency with the evaluation of the other two GCPs.

Ground-Based Gauge Data
Daily precipitation, maximum temperature and minimum temperature data from 1983 to 2007 were collected from the Malaysia Meteorological Department (MMD; http://www.met.gov.my/) and the Irrigation and Drainage Department Malaysia (DID; http://www.water.gov.my/).There are 29 climate stations distributed across the KRB, but only three of them contain long-term maximum and minimum temperature data.For the JRB, daily precipitation data are available at nine climate stations.However, only two of the stations contain temperature data.In addition, monthly streamflow data measured at the Jambatan Guillermard and Rantau Panjang stations located in KRB and JRB (Figure 1), respectively, were collected from the DID for calibration and validation of the SWAT model.More detailed information of streamflow measurements for the KRB, JRB and other basins in Malaysia are available in a report prepared by DID [39].

Geospatial Data
The main input geospatial data for the SWAT model are a digital elevation model (DEM), a land use map and a soil map.Tan et al. [40] evaluated four different DEM datasets on SWAT simulations in the JRB, and found the 90 m Shuttle Radar Topography Mission (SRTM) DEM [41] performed the best.Therefore, the SRTM DEM was selected in this study.The land use map and soil map produced in 1990 and 2002, respectively, were obtained from the Ministry of Agriculture and Agro-based Industry of Malaysia (MOA; http://www.moa.gov.my/).In addition, the river network for each basin was digitized from the topography map produced by the Department of Survey and Mapping Malaysia (JUPEM; https://www.jupem.gov.my/).The digitized river networks were used to improve basin delineation and river extraction of both basins, especially in low land regions, similar to the approach used by Zheng et al. [42].

Statistical Analysis
A set of continuous and categorical statistical analyses were used to evaluate the performance of the GCPs against observations at annual, seasonal, monthly and daily scales (Figure 1).As recommended by Tangang and Juneng [43], the climate data were divided into December to February (DJF), March to May (MAM), June to August (JJA) and September to November (SON) for seasonal scale assessment.The comparison was performed from 1983 to 2007 to provide a consistent time period, which brackets the starting year of 1983 for the PERSIANN-CDR dataset and the final year of 2007 for APHRODITE data.The point-to-pixel assessment was applied to prevent additional uncertainties during interpolation of the gauge data [44].For the overall assessment, all precipitation values are pooled together from 1983 to 2007 [45].In contrast, the NCEP-CFSR maximum and minimum temperature could not be validated at the overall assessment scale as there were only two or three climate stations that had temperature data in the KRB and JRB (Figure 1).Moreover, most of these stations are located outside the basins, and thus cannot be used to represent the entire basins.Therefore, the temperature data validation was conducted only for specific climate stations.In addition, the paired student t-test method was used to assess the significant differences between rain gauges and GCPs at the 0.05 significance level.Continuous statistical analysis such as Root Mean Square Error (RMSE), Pearson Correlation Coefficient (CC), Mean Error (ME) and Relative Bias (RB) were used [12].The formulas of these approaches are shown as follow: where G i and O i are gridded and observed precipitation/temperature, respectively; i is used to label the individual measurements; and n is the number of measurements.CC measures similarity in temporal or spatial pattern between GCP and the observed data, RMSE evaluates the absolute average error between two datasets, ME makes it possible to evaluate the bias in estimations, while RB estimates the systematic overestimation and underestimation of GCP as a percentage (%).A good performance GCP should have a high CC, versus low RMSE, ME and RB values.Categorical statistical analysis was used to evaluate the ability of GCPs to discriminate between precipitation and no precipitation event days, based on the following criteria [46]: (1) Accuracy (ACC), which represents the level of agreement between the GCPs and rain gauges estimates; (2) Probability of Detection (POD), which is a measure of how well the GCPs correctly detected rain gauge estimates; (3) False Alarm Ratio (FAR), which is used to evaluate how often the GCPs detected precipitation, but there was actually no precipitation recorded at the rain gauges; and (4) Critical Success Index (CSI), which is an indicator of the fraction of precipitation correctly detected by GCPs.These categorical approaches can be measured as follows: where A = correct detection (the GCP estimated precipitation, and precipitation was observed in rain gauge); B = false alarm (the GCP estimated precipitation, but precipitation was not observed in rain gauge); C = misses (the GCP did not estimate precipitation, but the rain gauge estimated precipitation); and D = correct negative (the GCP did not estimate precipitation, and precipitation was not observed in rain gauge).These values range between 0 and 1, where 1 is a perfect score for the ACC, POD and CSI, while 0 is a perfect score for the FAR.For example, the GCPs miss detecting the precipitation by 20%, if the FAR value is equal to 0.2.Further description of this approach is provided by Ebert et al. [46].Based on Shen et al. [47], the quality of the GCP accuracy assessment is largely influenced by the density and distribution of local station networks.Hence, assessment should be conducted over valid grid points only, where at least one station is available on each evaluated grid points.

SWAT Model
Current versions of the SWAT model represent more than three decades of model development at the co-located U.S. Department of Agriculture and Texas A&M University laboratories in Temple, Texas [34,35].SWAT is usually executed at a daily time step for continuous simulations [36], typically with a minimum climatic dataset consisting of daily precipitation, maximum temperature and minimum temperature.The model has been applied for an extensive range of ecohydrological problems and scenarios worldwide for watershed scales ranging from <1 km 2 to entire continents (e.g., see reviews by Gassman et al. [48,49]; Bressiani et al. [25]; Gassman and Wang [50]; and Krysanova and White [51]).The model has also been used successfully for several hydrology and pollutant transport studies conducted in Malaysia [40,[52][53][54][55]. SWAT version 2012 (Revision 635) was used in conjunction with the ArcSWAT interface version 2012.10_2.16for this study.
In SWAT, a basin is usually first sub-divided into multiple sub-basins that are then further delineated into hydrologic response units (HRUs), which are smaller spatial units consisting of homogeneous soil, landscape, land use and management characteristics.HRUs represent a specific percentage of the corresponding sub-watershed area and are not currently spatially identified in SWAT.For this study, digitized stream networks were merged into the SRTM DEM using the "burn in" method, resulting in the delineation of 22 and 11 sub-basins for the KRB and JRB, respectively (Figure 1).Threshold values were then used in the ArcSWAT interface to create the HRUs, by setting minimum percentages that specific soils, slopes or land use had to occupy within a given sub-basin in order to be included in the KRB or JRB SWAT models.The hydrologic response unit (HRU) threshold values were defined as 20% for land use and slope, and 10% for soil, resulting in the KRB and JRB being further subdivided into 200 and 37 HRUs, respectively.Initial simulation of climate inputs, hydrological balance, crop growth and pollutant cycling occurs at the HRU level in SWAT.Excess discharge and pollutant exports are then aggregated across HRUs within a given sub-basin, input into the stream network at the sub-basin outlet and then ultimately routed to the watershed outlet.Further details regarding the theory, input requirements, and output options are provided in on-line documentation [33,36].

SWAT Model Baseline Testing
Baseline hydrological testing of SWAT was performed for both the KRB and JRB prior to the analysis of the GCPs.The respective baseline testing periods of 1983 to 1999 for the KRB and 1983 to 1992 for the JRB were based on streamflow data measured at the stream gauge sites shown for each basin in Figure 1.The first two years (1983)(1984) were used as initialization years for both watersheds and the remainder of the time periods were subdivided into calibration (KRB = 1985-1994and JRB = 1985-1988) and validation (KRB = 1995-1999 and JRB = 1989-1992) periods.
SWAT calibration was conducted using the Sequential Uncertainty Fitting algorithm (SUFI-2) within the SWAT Calibration and Uncertainty (SWAT-CUP) software package [56], which is a flexible algorithm that can process large numbers of input parameters.The Nash-Sutcliffe Coefficient (NSE) and Coefficient of Determination (R 2 ) statistics [57] were used to evaluate performance of simulated streamflow.The NSE was selected as the optimal objective in the SWAT calibration; NSE values can range from −∞ to 1, where values ≤ 0 indicate that the mean of the measured data is a better predictor than the simulated values, indicating unacceptable performance.In addition, the R 2 values range from 0 to 1, and were used to assess the collinearity of the observed and simulated streamflow, where 1 is the ideal value.Based on Moriasi et al. [58,59], the performance of the SWAT model can be considered as satisfactory/good if the NSE and R 2 statistics are ≥0.5/0.7 and ≥0.6/0.75, respectively.
Following the SWAT calibration and validation phase, two different GCP scenarios were used as inputs into the calibrated SWAT model.The first scenario consisted of incorporating only the precipitation data from the three GCPs into the SWAT model simulations.This allows comparison with several previous studies, which only evaluated the GCP precipitation products.The second scenario evaluated combinations of each GCP with the NCEP-CFSR temperature data (i.e., APHRODITE, PERSIANN-CDR or NCEP-CFSR precipitation data + NCEP-CFSR temperature) on the SWAT outputs.The second scenario is useful for assessing the applicability of the NCEP-CFSR temperature data in SWAT modeling, due to the sensitivity of the water cycle to temperature data.

Extreme Events Analysis
Extreme climatic events can result in severe impacts on human society and the environment [60].The majority of existing hydrological and climatological studies, including analyses of the impacts of extreme climatic events have been conducted using ground-based gauge data [7,61].Therefore, evaluation of other types of precipitation products for extreme events would provide important insight for determining their efficacy and accuracy for unusual climatic conditions [62].Four indices were used in this study to assess the performance of the three GCPs in capturing the pattern of precipitation extremes over the KRB and JRB: (1) the number of precipitation days ≥10 mm•day −1 in a year (R10mm); (2) the number of precipitation days ≥50 mm•day −1 in a year (R50mm); (3) the annual maximum daily precipitation/streamflow amount(Rx1d); and (4) the annual maximum consecutive five-day precipitation/streamflow amount (Rx5d).The latter two indices were adopted to evaluate the accuracy of GCP-based SWAT simulated streamflows for maximum one-day and five-day amounts.These extreme indices were recommended by the Expert Team on Climate Change Detection and Indices [63].Annual maximum one-day and five-day consecutive streamflow indices were chosen because these indices can be used to study flood volume which is important for flood risk management [29].

Precipitation Validation
The result of the statistical assessment of the 25-year (1983 to 2007) comparisons between the APHRODITE, PERSIANN-CDR and NCEP-CFSR annual, seasonal, monthly and daily precipitation data versus the rain gauge observations for the KRB and JRB is listed in Table 2.The PERSIANN-CDR monthly-scale precipitation was the only GCP data that did not show significant differences relative to the KRB rain gauge observations, at a significance level of 0.05 (Table 2).The PERSIANN-CDR data showed insignificant differences versus observations at the JJA seasons in both basins.
In the KRB, the APHRODITE precipitation data produced the best linear correlation for all time-scales, with CC values varying from 0.38 to 0.74, followed by the PERSIANN-CDR and NCEP-CFSR data.It is also clear that the APHRODITE and PERSIANN-CDR precipitation data underestimated the annual, DJF, SON, monthly and daily precipitation amounts, based on the respective positive and negative signs for the ME and RB indicators, while the NCEP-CFSR data resulted in highly overestimated precipitation across the basin.In addition, the NCEP-CFSR data showed the largest average errors as evidenced by the highest RMSE values that ranged from 19.49 mm to 1695.34 mm for most of the time-scales, except for the DJF.
All other GCP data showed significant differences for annual, daily and monthly time steps as compared to the rain gauge precipitation estimates for the JRB (Table 2).The APHRODITE data produced the best results at the DJF, JJA, SON, monthly and daily time-scales, with CC values that ranged from 0.44 to 0.73.In contrast, the NCEP-CFSR data resulted in the worst performance at all time scales with CC values that spanned between 0.13 and 0.46.The APHRODITE data slightly underestimated the MMA, SON, monthly and daily precipitation levels, versus the PERSIANN-CDR and NCEP-CFSR data which produced large overestimations.
Generally, the GCPs show better linear correlation performance for the DJF and monthly time-scale estimations as compared to other time scales in both basins.The results found here showed that the APHRODITE data produced the best precipitation estimation performance for over both basins, which is in agreement with Tan et al. [12] who conducted a national assessment over Malaysia.The main reason is due to the fact that the developers of APHRODITE incorporated MMD rain gauges' data in the development of the product [7].On the contrary, NCEP-CFSR displays more serious errors and dramatically overestimated the total precipitation compared to the other GCPs.Similarly, Roth and Lemann [64] found that the total annual NCEP-CFSR precipitation data was three times greater than observed precipitation data in Ethiopia.The distinct weaknesses that have been quantified for the NCEP-CFSR data may be attributed to the scale differences, where the size of a grid point is huge (up to 0.3125 • ) compared to the station data which is a point-based measurement.The errors are expected to be higher in a grid point with high spatial and temporal variability of precipitation as well as for regions characterized by complex topography [65].

Precipitation Spatial Variability
The monthly CC and RB values for the GCPs over both basins are presented in Figures 2 and 3, respectively, to provide insights regarding spatial variability.Generally, high CC values for all GCPs were found for the northern and eastern KRB sub-regions, which are near coastal and low elevation areas (Figure 2a-c).All of the GCPs reflected strong performance of the CC values computed for the northwest JRB sub-region, while lower CC values dominated in the middle of the basin (Figure 2d-f).The APHRODITE data underestimated monthly ground-based precipitation at most of the stations (Figure 3).In contrast, the NCEP-CFSR data dramatically overestimated monthly precipitation at all of the stations, resulting in especially high RB values (more than 100%) for the stations mainly distributed in the southwestern KRB sub-region, which is characterized by high mountains (Figure 3c).The NCEP-CFSR was the only GCP which resulted in significant overestimates for all stations distributed across the JRB.

Precipitation Spatial Variability
The monthly CC and RB values for the GCPs over both basins are presented in Figures 2 and 3, respectively, to provide insights regarding spatial variability.Generally, high CC values for all GCPs were found for the northern and eastern KRB sub-regions, which are near coastal and low elevation areas (Figure 2a-c).All of the GCPs reflected strong performance of the CC values computed for the northwest JRB sub-region, while lower CC values dominated in the middle of the basin (Figure 2d-f).The APHRODITE data underestimated monthly ground-based precipitation at most of the stations (Figure 3).In contrast, the NCEP-CFSR data dramatically overestimated monthly precipitation at all of the stations, resulting in especially high RB values (more than 100%) for the stations mainly distributed in the southwestern KRB sub-region, which is characterized by high mountains (Figure 3c).The NCEP-CFSR was the only GCP which resulted in significant overestimates for all stations distributed across the JRB.

Precipitation Spatial Variability
The monthly CC and RB values for the GCPs over both basins are presented in Figures 2 and 3, respectively, to provide insights regarding spatial variability.Generally, high CC values for all GCPs were found for the northern and eastern KRB sub-regions, which are near coastal and low elevation areas (Figure 2a-c).All of the GCPs reflected strong performance of the CC values computed for the northwest JRB sub-region, while lower CC values dominated in the middle of the basin (Figure 2d-f).The APHRODITE data underestimated monthly ground-based precipitation at most of the stations (Figure 3).In contrast, the NCEP-CFSR data dramatically overestimated monthly precipitation at all of the stations, resulting in especially high RB values (more than 100%) for the stations mainly distributed in the southwestern KRB sub-region, which is characterized by high mountains (Figure 3c).The NCEP-CFSR was the only GCP which resulted in significant overestimates for all stations distributed across the JRB.These findings agree with other studies, which state that GCPs generally are more reliable in low land regions compared to higher elevations [66,67].This might be due to misrepresenting the effects of warm clouds, by infrared (IR) sensors that commonly appear on mountaintops [68].The overall less accurate performance of GCPs in mountainous regions may be due to fewer rain gauges that can be used for product development.The installation and maintenance of climate stations in high mountainous regions is often problematic because of difficulties related to physical access and the fact the climate stations are representative of relatively small area due to high topography variability.In general, the APHRODITE dataset performed better for mountainous regions compared to other two GCPs, because the product has better orographic precipitation variability resolving skill [69].

Precipitation: Rain Detection and Intensity Assessment
The NCEP-CFSR data showed the most outstanding performance for rain detection ability assessment, with POD values of 0.94 and 0.96 for KRB and JRB, respectively.However, the APHRODITE exhibits better ACC skills for the JRB, indicating that it has a stronger capability to correctly estimate overall precipitation and non-precipitation events in southern Peninsular Malaysia.In contrast, the PERSIANN-CDR and NCEP-CFSR GCPs performed better for the KRB.The analysis further revealed that the NCEP-CFSR data were most prone to predicting false rain event, which in fact were not recorded by the rain gauges, resulting in the highest FAR values of 0.52 (KRB) and 0.57 (JRB).Moderate CSI values were also predicted for all three GCPs ranging from 0.45 to 0.48 (KRB) and 0.42 to 0.51 (JRB), demonstrating that roughly 50% of the precipitation was correctly estimated.
Figure 4 presents the probability distribution functions (PDFs) of precipitation intensity for the KRB and JRB.The non-precipitation values ≤0.254 mm•day −1 (common rain gauge threshold detection limit) were removed from the analysis.The three GCPs showed moderate underestimation for the ≥50 mm•day −1 precipitation classes over both basins.The NCEP-CFSR data resulted in significant overestimation for the 5-10 and 10-20 mm•day −1 precipitation classes in both basins.This is similar to the results reported by Blacutt et al. [70], who also discovered the NCEP-CFSR overestimated precipitation at 3-20 mm•day −1 class in Bolivia.They further reported the NCEP-CFSR tended to overestimate precipitation during the annual precipitation season period.This problem could potentially be amplified in both the KRB and JRB, which are typical tropical basins that receive precipitation throughout the year, especially during the northeast monsoon and southwest monsoon periods.The NCEP-CFSR data overestimation rate was higher for the JRB (up to 270% at 5-10 mm•day −1 ) compared to the KRB, because the Sumatra and Titiwangsa mountain ranges help to reduce precipitation days in the KRB during the southwest monsoon season.These findings agree with other studies, which state that GCPs generally are more reliable in low land regions compared to higher elevations [66,67].This might be due to misrepresenting the effects of warm clouds, by infrared (IR) sensors that commonly appear on mountaintops [68].The overall less accurate performance of GCPs in mountainous regions may be due to fewer rain gauges that can be used for product development.The installation and maintenance of climate stations in high mountainous regions is often problematic because of difficulties related to physical access and the fact the climate stations are representative of relatively small area due to high topography variability.In general, the APHRODITE dataset performed better for mountainous regions compared to other two GCPs, because the product has better orographic precipitation variability resolving skill [69].

Precipitation: Rain Detection and Intensity Assessment
The NCEP-CFSR data showed the most outstanding performance for rain detection ability assessment, with POD values of 0.94 and 0.96 for KRB and JRB, respectively.However, the APHRODITE exhibits better ACC skills for the JRB, indicating that it has a stronger capability to correctly estimate overall precipitation and non-precipitation events in southern Peninsular Malaysia.In contrast, the PERSIANN-CDR and NCEP-CFSR GCPs performed better for the KRB.The analysis further revealed that the NCEP-CFSR data were most prone to predicting false rain event, which in fact were not recorded by the rain gauges, resulting in the highest FAR values of 0.52 (KRB) and 0.57 (JRB).Moderate CSI values were also predicted for all three GCPs ranging from 0.45 to 0.48 (KRB) and 0.42 to 0.51 (JRB), demonstrating that roughly 50% of the precipitation was correctly estimated.
Figure 4 presents the probability distribution functions (PDFs) of precipitation intensity for the KRB and JRB.The non-precipitation values ≤0.254 mm•day −1 (common rain gauge threshold detection limit) were removed from the analysis.The three GCPs showed moderate underestimation for the ≥50 mm•day −1 precipitation classes over both basins.The NCEP-CFSR data resulted in significant overestimation for the 5-10 and 10-20 mm•day −1 precipitation classes in both basins.This is similar to the results reported by Blacutt et al. [70], who also discovered the NCEP-CFSR overestimated precipitation at 3-20 mm•day −1 class in Bolivia.They further reported the NCEP-CFSR tended to overestimate precipitation during the annual precipitation season period.This problem could potentially be amplified in both the KRB and JRB, which are typical tropical basins that receive precipitation throughout the year, especially during the northeast monsoon and southwest monsoon periods.The NCEP-CFSR data overestimation rate was higher for the JRB (up to 270% at 5-10 mm•day −1 ) compared to the KRB, because the Sumatra and Titiwangsa mountain ranges help to reduce precipitation days in the KRB during the southwest monsoon season.

Temperature Validation
The statistical analysis of the NCEP-CFSR maximum and minimum temperature versus climate stations temperature gauges (Figure 1) of the KRB and JRB is listed for various time scales in Table 3.The temperature values from each temperature gauge were compared to the nearest NCEP-CFSR grid point.Generally, the NCEP-CFSR temperature data have better correlation with observations at the DJF and monthly time-scale, with CC values ranging from 0.6 to 0.91 and 0.57 to 0.93, respectively.In addition, the daily maximum temperature data were better correlated with the observed data as compared to the minimum temperature data.However, the average error of the daily maximum temperature data (RMSE = 2.58 to 3.32 • C) is larger than the minimum temperature (RMSE = 0.98 to 2.68 • C) at all stations.
Box plots of the interactions between the NCEP-CFSR data and climate station maximum and minimum temperature data, for the four climate stations distributed across the KRB and JRB, are shown in Figure 5.The inter-quartile range shows that the minimum temperature at the 48679 station provides the best performance, as the range of the NCEP-CFSR data versus the gauge data matched quite well.The range of the NCEP-CFSR temperature data is larger than the observations at the all stations.As can be seen from the Table 3 and Figure 5, the NCEP-CFSR temperature data tend to underestimate the actual maximum and minimum temperature values.The main reason of the underestimation could be due to the land use types [65].For example, the 48679 station is located in an industrial area where the surface temperature is expected to be higher.However, the NCEP-CFSR relies on National Aeronautics and Space Administration (NASA) land use information data [71], so reliable local land use information might be missing for the 48679 station location.Another possible reason for the underestimation of the NCEP-CFSR data may be explained by the mismatch of the temperature time measurement.For instance, the climate stations' daily maximum and minimum temperature data were taken at 0800 and 1400 local time, respectively, while the NCEP-CFSR daily maximum and minimum temperature were obtained from hourly values [72].

Temperature Validation
The statistical analysis of the NCEP-CFSR maximum and minimum temperature versus climate stations temperature gauges (Figure 1) of the KRB and JRB is listed for various time scales in Table 3.The temperature values from each temperature gauge were compared to the nearest NCEP-CFSR grid point.Generally, the NCEP-CFSR temperature data have better correlation with observations at the DJF and monthly time-scale, with CC values ranging from 0.6 to 0.91 and 0.57 to 0.93, respectively.In addition, the daily maximum temperature data were better correlated with the observed data as compared to the minimum temperature data.However, the average error of the daily maximum temperature data (RMSE = 2.58 to 3.32 °C) is larger than the minimum temperature (RMSE = 0.98 to 2.68 °C) at all stations.
Box plots of the interactions between the NCEP-CFSR data and climate station maximum and minimum temperature data, for the four climate stations distributed across the KRB and JRB, are shown in Figure 5.The inter-quartile range shows that the minimum temperature at the 48679 station provides the best performance, as the range of the NCEP-CFSR data versus the gauge data matched quite well.The range of the NCEP-CFSR temperature data is larger than the observations at the all stations.As can be seen from the Table 3 and Figure 5, the NCEP-CFSR temperature data tend to underestimate the actual maximum and minimum temperature values.The main reason of the underestimation could be due to the land use types [65].For example, the 48679 station is located in an industrial area where the surface temperature is expected to be higher.However, the NCEP-CFSR relies on National Aeronautics and Space Administration (NASA) land use information data [71], so reliable local land use information might be missing for the 48679 station location.Another possible reason for the underestimation of the NCEP-CFSR data may be explained by the mismatch of the temperature time measurement.For instance, the climate stations' daily maximum and minimum temperature data were taken at 0800 and 1400 local time, respectively, while the NCEP-CFSR daily maximum and minimum temperature were obtained from hourly values [72].

Streamflow: GCPs Precipitation Data
Table 4 lists the best fitted calibration parameters for KRB and JRB.The calibration and validation of the SWAT model were conducted based on local knowledge and a literature review of the SWAT model in tropical regions (e.g., [54,55,73,74]).As can be seen in Table 4, the CN2 values were increased by 1% and 13% for the KRB and JRB, respectively.This increment of CN2 values was also observed in calibration of other tropical SWAT models [75][76][77].The CN2 value was higher in the JRB as it is dominated by oil palm plantations, where the surface runoff is generally higher than in a forest basin (KRB).Generally, the SWAT simulations that were based on rain gauge data agreed well with the observed streamflow during the calibration and validation periods for both the KRB and JRB (Figure 6).The NSE values that were computed for the KRB (JRB) were 0.75 (0.78) and 0.65 (0.6) for the calibration and validation periods, respectively (Table 5), and the corresponding KRB (JRB) R 2 statistics were 0.87 (0.78) and 0.84 (0.61) indicating that the SWAT model performed well for both basins based on the previously discussed suggested criteria [58,59].Among the three GCPs, the most accurate KRB SWAT simulations occurred in response to the APHRODITE precipitation input, followed by the simulations driven by the PERSIANN-CDR and NCEP-CFSR precipitation data.The SWAT simulation streamflow trends, based on the APHRODITE and PERSIANN-CDR data, revealed overestimation of low streamflows and underestimation of high streamflows.The predicted streamflow results obtained with the NCEP-CFSR data were unacceptable as reflected by the negative NSE values (Table 6).In addition, the NCEP-CFSR precipitation data resulted in relatively high overestimation of observed streamflows throughout the simulation period, as indicated by the high RB values of 167.77% and 143.72% during the calibration and validation periods, respectively.
Similar results were obtained in the JRB, where the SWAT simulations that were driven by the APHRODITE precipitation data yielded the best calibration and validation (Figure 6 and Table 6), followed again by the PERSIANN-CDR and NCEP-CFSR precipitation data.However, both the PERSIANN-CDR and NCEP-CFSR data resulted in unacceptable performance as shown, by the mostly negative NSE values (Table 6).Overestimation of the observed streamflows is also clearly shown in the PERSIANN-CDR-and NCEP-CFSR-based JRB SWAT streamflow predictions (Figure  1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 Stremaflow (m 3  Among the three GCPs, the most accurate KRB SWAT simulations occurred in response to the APHRODITE precipitation input, followed by the simulations driven by the PERSIANN-CDR and NCEP-CFSR precipitation data.The SWAT simulation streamflow trends, based on the APHRODITE and PERSIANN-CDR data, revealed overestimation of low streamflows and underestimation of high streamflows.The predicted streamflow results obtained with the NCEP-CFSR data were unacceptable as reflected by the negative NSE values (Table 6).In addition, the NCEP-CFSR precipitation data resulted in relatively high overestimation of observed streamflows throughout the simulation period, as indicated by the high RB values of 167.77% and 143.72% during the calibration and validation periods, respectively.
Similar results were obtained in the JRB, where the SWAT simulations that were driven by the APHRODITE precipitation data yielded the best calibration and validation (Figure 6 and Table 6), followed again by the PERSIANN-CDR and NCEP-CFSR precipitation data.However, both the identified using more reliable observations, while the other two require local expert knowledge with modeling skill.Hence, multiple GCP data should be evaluated through an initial assessment prior to applying them in any hydrological models.

Extreme Event Assessment
The final aspect of the overall analysis was to evaluate the capability of the GCPs to predict extreme precipitation events (Table 7).All of the GCPs showed significant differences at 0.05 significance level when compared with the observed precipitation, except for the NCEP-CFSR data when assessed for the Rx1d index for the KRB.The APHRODITE data exhibited better correlation with observed precipitation for three of the indices (Rx1d, Rx5d and R10mm) in both basins versus the other GCPs, while the PERSIANN-CDR data resulted in the best performance in the R50mm index estimation.In addition, the majority of the RB values, which were calculated for the Rx1d, Rx5d and R50mm indices estimated by the three GCPs, were negative.This is similar to the findings reported by Miao et al. [78], who found that the PERSIANN-CDR data tends to underestimate the Rx1d and Rx5d indices in the eastern China region.This can be explained by the fact that most of the GCPs underestimated the precipitation range which is greater than 50 mm in the two basins (Figure 4).The RB statistic was used to quantify the difference in accuracy in simulating extreme streamflow events, based on the Rx1d and Rx5d indices, between the rain gauge-based and other three GCPs for the KRB (Figure 7) and JRB (Figure 8) because it provided a reliable basis for comparison of different case studies [79].The majority of the RB values calculated for the APHRODITE and PERSIANN-CDR Rx1d and Rx5d indices are negative, indicating that most of the high streamflows were underestimated.However, the reverse pattern can be observed for the RB values determined for the respective NCEP-CFSR indices, indicating that streamflow was significantly overestimated in both basins for the NCEP-CFSR-based SWAT simulations.

Discussion
In this study, six different sets of GCP precipitation and temperature inputs were forced to drive the SWAT model.The overall results of the analyses of the GCP data clearly revealed that the APHRODITE precipitation data resulted in the best performance of the three GCP data sources, based on the SWAT simulation graphical and statistical results.These results agree with the findings reported in several other studies, which showed that SWAT simulations executed with APHRODITE precipitation data performed very well in central Vietnam [23,24,80]; glacier influenced basins in mountainous regions in northwest China [81,82] and central Asia [83,84]; and a major tributary of the Yangtze River in central China [85].Lauri et al. [31] also found that executing the VMod hydrological model [86] with combined APHRODITE precipitation and NCEP-CFSR temperature inputs accurately replicated hydrological simulations based on surface climate inputs of the 795,000 km 2 Mekong River Basin in southeast Asia.These composite results underscore the strength of the APHRODITE precipitation data for a variety of Asian conditions and that it can reliably be used for hydrological applications in un-gauged, data limited or restricted basins in the Southeast Asia.
The results found here clearly show that the original NCEP-CFSR precipitation is not suitable to apply for streamflow simulations in Malaysia, which is in agreement with the findings of Monteiro et al. [27], Roth and Lemann [64] and Bressiani et al. [87] for other tropical or sub-tropical conditions.However, the results found here conflict with the findings of Jajarmizadeh et al. [88], who report successful SWAT streamflow simulation results using the NCEP-CFSR data for the Roodan watershed that is located in southern Iran.Differences in climate and geographical conditions are the most likely explanation for such differences between the Jajarmizadeh et al. [88] study and the results reported in this research and other previously cited studies.In addition, the streamflow overestimation that resulted from the use of the NCEP-CFSR data in this study could be related to possible problems that occur over tropical regions [70], including the effects of the satellite algorithms on precipitation estimation and the CFSR model parameterizations.
In general, the performance of the APHRODITE data was better for the KRB compared to the JRB.This is due in part to a more complete distribution of rain gauges for the KRB versus the JRB (Figure 1); the JRB lacks long-term climate data representation in the northern part of the basin.In addition, the PERSIANN-CDR precipitation-based SWAT simulation also performed better for the KRB, which is consistent with Zhu et al. [29] who found that the PERSIANN-CDR data resulted in a smaller relative error in a data-rich region.These results are consistent with previously reported findings that improved SWAT hydrologic simulations usually occur in response to precipitation inputs characterized by higher resolution, versus lower resolution precipitation inputs [89][90][91].
As shown in Table 6, we also found that the effect of the basin size proved to be of minor importance compared to the performance of the three GCPs.For instance, the NCEP-CFSR data performed poorly in both basins, regardless of size and flow characteristics, while the APHRODITE precipitation resulted in the best performance for both basins.We also note that differences in sub-basin and/or HRU delineations, while not investigated in this study, typically do not impact SWAT streamflow and other hydrologic outputs as discussed in a previous review of SWAT literature [48] and reported in several subsequent SWAT applications [92][93][94][95].
Finally, it is important to emphasize that there were distinct periods within the overall simulation timeframe in which prevailing periods of bias actually were reversed for a specific GCP; e.g., streamflow extremes were overestimated during periods where precipitation extremes were underestimated.For example, the PERSIANN-CDR underestimated the Rx1d precipitation index by about 45% during 1989, but the corresponding Rx1d streamflow index was overestimated by 31.3%.This is consistent with the findings of a similar study conducted by Zhu et al. [29] for the Xiang River and Qu River watersheds in China.This finding indicates that there are certain periods where the precipitation generated by GCPs is unlikely to accurately capture the amount and durations of extreme events.This is further exacerbated by the fact that there is a variation between the precipitation and streamflow extremes temporal scales.For example, peak streamflow usually occurred a few days/hours after the corresponding peak precipitation, but the peak streamflow normally represents an accumulation of precipitation events that occurred over several days/hours.

Conclusions
The performance of the APHRODITE, PERSIANN-CDR and NCEP-CFSR long-term gridded climate products (GCPs) were evaluated versus observed climate data for the Kelantan River Basin (KRB) and Johor River Basin (JRB), which are both tropical basins located in Peninsular Malaysia.The analysis included the assessment of capability of replicating streamflow for both basins using climate data from these GCPs as inputs to the calibrated SWAT model.The main conclusions obtained are as follows: (1) The APHRODITE data typically replicated the observed monthly and daily precipitation more accurately over both the KRB and JRB, followed by the PERSIANN-CDR data and lastly the NCEP-CFSR data.The APHRODITE data tended to underestimate the observed daily and monthly precipitation in both basins, while the NCEP-CFSR data dramatically overestimated the observed precipitation data.The PERSIANN-CDR data resulted in a slight underestimation of the observed KRB precipitation and an overestimation of the JRB precipitation.(2) The overall performance of the GCPs was better in low land and near coastal regions, such as the northern and eastern KRB.On the contrary, the performance of the GCPs was poor for the high mountainous regions located in the southwestern part of the KRB.Generally, the APHRODITE data resulted in stronger replication of precipitation in mountainous regions compared to the other two GCPs.
(3) The GCPs were found to have moderate accuracy (ACC), false alarm ratio (FAR), and critical success index (CSI), and a high probability of detection (POD) over the two basins that we have studied; the APHRODITE data resulted in the best performance.All three GCPs underestimated the extreme precipitation ranges (≥50 mm•day -1 ) and dramatically overestimated the observed moderate precipitation ranges (2-20 mm•day -1 ).(4) The APHRODITE data resulted in strong replication of observed streamflows when input to the calibrated SWAT simulations, while, the NCEP-CFSR was unable to replicate the observed streamflows for either basin in the calibrated SWAT.The PERSIANN-CDR data generated an in-between performance in the calibrated SWAT model, resulting in acceptable representation of KRB observed streamflows but an inability to track the JRB observed streamflows.(5) We recommend the integration of the APHRODITE precipitation and the NCEP-CFSR temperature data for SWAT modeling in Malaysia as well as Southeast Asia region.However, a bias correction should be conducted if the gauge data are available, in order to improve the accuracy of the SWAT modeling.(6) The APHRODITE data and PERSIANN-CDR data underestimated the annual maximum one-day streamflow (Rx1d) and five-day consecutive streamflow (Rx5d) indices.In contrast, the NCEP-CFSR dramatically overestimated the Rx1d and Rx5d streamflow indices in both basins.Basically, all three GCPs performed poorly in capturing extreme events, where high bias was found in certain periods.
Finally, these findings demonstrate how large uncertainties of GCP inputs can propagate within streamflow modeling, which can greatly affect the accuracy of streamflow simulations.This could lead to erroneous results that in turn could lead to wrong conclusions, which could impact the development of management systems and local policies.Therefore, development of an improved quantification framework for more accurate comparisons between different study areas should be a focus for future research.Similar studies should be conducted in other watershed systems with varying climatic and geographical conditions, to expand the testing of the GCPs and provide feedback to the GCP producers that can be used to develop better products.

Figure 2 .
Figure 2. The correlation coefficient of monthly precipitation of APHRODITE, PERSIANN-CDR and NCEP-CFSR against rain gauges, respectively, over: (a-c) Kelantan River Basin; and (d-f) Johor River Basin.

Figure 3 .
Figure 3.The relative bias of monthly precipitation of APHRODITE, PERSIANN-CDR and NCEP-CFSR against rain gauges, respectively, over: (a-c) Kelantan River Basin; and (d-f) Johor River Basin.

Figure 2 .
Figure 2. The correlation coefficient of monthly precipitation of APHRODITE, PERSIANN-CDR and NCEP-CFSR against rain gauges, respectively, over: (a-c) Kelantan River Basin; and (d-f) Johor River Basin.

Figure 2 .
Figure 2. The correlation coefficient of monthly precipitation of APHRODITE, PERSIANN-CDR and NCEP-CFSR against rain gauges, respectively, over: (a-c) Kelantan River Basin; and (d-f) Johor River Basin.

Figure 3 .
Figure 3.The relative bias of monthly precipitation of APHRODITE, PERSIANN-CDR and NCEP-CFSR against rain gauges, respectively, over: (a-c) Kelantan River Basin; and (d-f) Johor River Basin.

Figure 3 .
Figure 3.The relative bias of monthly precipitation of APHRODITE, PERSIANN-CDR and NCEP-CFSR against rain gauges, respectively, over: (a-c) Kelantan River Basin; and (d-f) Johor River Basin.

Figure 7 .
Figure 7. Relative bias values of annual maximum: (a) one-day precipitation/streamflow; and (b) five-day consecutive precipitation/streamflow from 1985 to 2007 in the Kelantan River Basin.

Figure 8 .
Figure 8. Relative bias values of annual maximum: (a) one-day precipitation/streamflow; and (b) five-day consecutive precipitation/streamflow from 1985 to 2007 in the Johor River Basin.

Figure 7 .
Figure 7. Relative bias values of annual maximum: (a) one-day precipitation/streamflow; and (b) five-day consecutive precipitation/streamflow from 1985 to 2007 in the Kelantan River Basin.

Figure 7 .
Figure 7. Relative bias values of annual maximum: (a) one-day precipitation/streamflow; and (b) five-day consecutive precipitation/streamflow from 1985 to 2007 in the Kelantan River Basin.

Figure 8 .
Figure 8. Relative bias values of annual maximum: (a) one-day precipitation/streamflow; and (b) five-day consecutive precipitation/streamflow from 1985 to 2007 in the Johor River Basin.

Figure 8 .
Figure 8. Relative bias values of annual maximum: (a) one-day precipitation/streamflow; and (b) five-day consecutive precipitation/streamflow from 1985 to 2007 in the Johor River Basin.

Table 1 .
Details on gridded climate products used in this study.

Table 2 .
Statistical analysis for daily, monthly, seasonal and annual precipitation in the Kelantan River Basin (KRB) and Johor River Basin (JRB).(Bold indicate significance at 0.05).

Table 3 .
Statistical analysis for the NCEP maximum (Tmax) and minimum (Tmin) temperature in the Kelantan River Basin and Johor River Basin.
Note: R indicates the default parameter value is multiplied by (1+ a given value) and V indicates the default parameter value is replaced with the given value.

Table 5 .
SWAT calibration and validation statistical results for the Kelantan River Basin (KRB) and Johor River Basin (JRB).

Table 5 .
SWAT calibration and validation statistical results for the Kelantan River Basin (KRB) and Johor River Basin (JRB).

Table 7 .
Statistical analysis for extreme precipitation indices in Kelantan River Basin and Johor River Basin (bold indicate significance at 0.05).