Investigating Alternative Climate Data Sources for Hydrological Simulations in the Upstream of the Amu Darya River

The main objective of this study is to investigate alternative climate data sources for long-term hydrological modeling. To accomplish this goal, one weather station data set (WSD) and three grid-based data sets including three types of precipitation data and two types of temperature data were selected according to their spatial and temporal details. An accuracy assessment of the grid-based data sets was performed using WSD. Then, the performances of corrected data combination and non-corrected grid-based precipitation and temperature data combinations from multiple sources on simulating river flow in the upstream portion of the Amu Darya River Basin (ADRB) were analyzed using a Soil and Water Assessment Tool (SWAT) model. The results of the accuracy assessments indicated that all the grid-based data sets underestimated precipitation. The Asian Precipitation Highly Resolved Observational Data Integration Towards the Evaluation of Water Resources (APHRODITE) precipitation data provided the highest accuracy (correlation coefficients (CF) > 0.89, root mean square error (RMSE) < 41.6 mm), followed by the CRUNCEP reanalysis data (a combination of the CRU TS.3.2 data and the National Centers for Environmental Prediction (NCEP) reanalysis data) (CF > 0.5, RMSE < 58.1 mm) and Princeton's Global Meteorological Forcing Dataset (PGMFD) precipitation data (CF > 0.46, RMSE < 62.8 mm). The PGMFD temperature data exhibited a higher accuracy (CF > 0.98, RMSE < 7.1 • C) than the CRUNCEP temperature data (CF > 0.97, RMSE < 4.9 • C). In terms of the simulation performance, the corrected APHRODITE precipitation and PGMFD temperature data provided the best performance. The CF and Nash-Sutcliffe (NSE) coefficients in the calibration and validation periods were 0.96 and 0.92 and 0.93 and 0.83, respectively. In addition, the combinations of PGMFD temperature data and APHRODITE, PGMFD and CRUNCEP precipitation data produced good results, with NSE ≥ 0.70 and CF ≥ 0.89. The combination of CRUNCEP temperature data and APHRODITE precipitation produced a satisfactory result, with NSE = 0.58 and CF = 0.82. The combinations of CRUNCEP temperature data and PGMFD and CRUNCEP precipitation data produced poor results.


Introduction
One of the challenges in modeling watershed hydrology is obtaining accurate weather input data [1,2], which are generally one of the most important drivers of watershed models [3,4].Spatial and temporal variability are key characteristics of hydrological processes [5].In many instances, distributed hydrological models require daily distributed meteorological data to simulate the hydrological cycle.However, some modeling scenarios require hourly or monthly data.Lacks of data and inaccuracies in data have the largest impact on model simulations [6,7].Distributed hydrological models require spatially distributed, long-term, continuous data to simulate the impact of climate change and management practices on hydrological processes.However, conventional weather stations are often sparsely distributed and cannot fully represent the climate conditions across a watershed, particularly if large hydro climatic gradients exist [8][9][10].In addition, weather station records often do not cover the proposed simulation period or contain gaps.
To solve this problem, some researchers have used grid-based data (e.g., atmospheric model analysis or reanalysis outputs, radar data and gridded station observations, i.e., observations that have been interpolated to a regular grid).One of the most common ways of determining quality is to assess the accuracy of the data source and test its performance in a hydrologic model, or uncertainty assessments of the potential impacts of weather inputs for model prediction using latent variables [11], simultaneous data assimilation and parameter estimation [12] and using probabilistic techniques such as Bayesian Model Averaging (BMA) or the Integrated Bayesian Uncertainty Estimator (IBUNE) [13,14].Most studies have focused on evaluating the performance of grid-based precipitation data in simulating hydrologic processes [15][16][17][18][19][20][21][22][23][24][25], while others have focused on evaluating the performances of different parameters in one data set in simulating hydrologic processes [26][27][28][29].Some studies have evaluated the respective performances of different variables associated with multisource grid-based data in hydrologic modeling [30,31].However, nearly 80% of water resources in the current region of interest are generated from snow and glacier melt.Thus, the impact of the accuracy of temperature data on runoff modeling in this region cannot be neglected.In addition, different types of data sets have varying accuracy levels across different regions with various weather station distributions.Therefore, alternative climate data sources must be identified in data-scarce regions [32].
The hydrologic regime of the Amu Darya River Basin (ADRB) is complex and vulnerable to climate change [8].Water diversion for agricultural, industrial and domestic users has significantly reduced flows in downstream regions [33], resulting in severe ecological damage [34].The scarcity of meteorological data remains a major hindrance in using hydrologic models in this region.Some studies have been designed to overcome these data limitations.Monthly reanalyzed data from the Climate Research Unit (CRU TS.3.2) have been used in numerous studies [9,33,35,36].Precipitation estimation from remotely sensed information using artificial neural networks (PERSIANN) precipitation products [8], the Willmott archived data set, the GSMaP satellite-driven data set [37], the global climatology precipitation product (GPCP), the Global Precipitation Climatology Center (GPCC) [38] and ERA-15 data [39] have been used to simulate the influence of climate change on water resources in this region.
However, most of these studies have used data sets on monthly time scales [9,33,35,36], used daily time steps without correction for short-term simulations [8,16,20,22,37,40,41] or evaluated the performances of different variables associated with multisource gridded data in hydrologic modeling [23,24].In addition, previous studies focused on evaluation of the precipitation data and neglect the temperature data evaluation.Thus, it is essential to identify alternative climate data sources on a daily time step and evaluate their effectiveness for long-term hydrological modeling.To achieve this goal, one weather station data set (WSD) and three types of data sets with daily time steps over long-term periods were tested in this study to simulate river flow using the Soil and Water Assessment Tool (SWAT).Thus, the major goal of this study is to investigate alternative climate data sources for improving the performance of distributed hydrologic models and to provide a practical basis for further analysis on hydrological processes and other topics.Therefore, we focused on a data accuracy assessment and performance evaluations but did not consider the simulation uncertainty caused by weather inputs due to the length of the paper.To achieve this goal, the Central Asia Temperature and Precipitation Data (CATPD), Asian Precipitation Highly Resolved Observational Data Integration Towards the Evaluation of Water Resources (APHRODITE) data, Princeton's Global Meteorological Forcing Data (PGMFD) and CRUNCEP reanalysis data (a combination of the CRU TS.3.2 monthly data and the National Centers for Environmental Prediction (NCEP) reanalysis data) were selected due to their spatial and temporal representativeness of the processes being measured.Our first objective was to evaluate the accuracy of grid-based data.The second objective was to investigate the performances of corrected data combination and non-corrected grid-based data combinations of precipitation and temperature data from multiple sources on simulating river flow in the upstream portion of the Amu Darya River Basin (ADRB), and we selected the suitable combinations for simulating river flow in this region.

Study Area
The watershed is located between 38.66 • N-39.86 • N and 70.28 • E-73.71 • E and covers an area of 19,638 km 2 .This area is a mountainous area, and the elevation ranges from 1294 m to 7198 m (Figure 1).The drainage area includes land cover types such as forest (3.64%), pasture (3.04%), agricultural land (0.17%), snow and ice (15.53%), bare land (13.39%) and sparsely vegetated area (64.23%) (Figure 1).The main soil types in this region are sandy soil (29.32%), mollic leptosols (21.27%), cumulic anthrosols (19.23%), haplic kastanozems (17.03%) and calcic chernozems (13.14%).In general, the climate in this region exhibits continental and subtropical features.The average annual temperature ranges from −7.More than 80% of the precipitation occurs from October to May of the following year (Figure 2).The maximum and minimum precipitation totals occur in May and August, whereas the peak and low flows occur in August and March, respectively (Figure 2).The main water resources of this region are precipitation, snowmelt and glacier melt.(NCEP) reanalysis data) were selected due to their spatial and temporal representativeness of the processes being measured.Our first objective was to evaluate the accuracy of grid-based data.The second objective was to investigate the performances of corrected data combination and non-corrected grid-based data combinations of precipitation and temperature data from multiple sources on simulating river flow in the upstream portion of the Amu Darya River Basin (ADRB), and we selected the suitable combinations for simulating river flow in this region.

Study Area
The watershed is located between 38.66° N-39.86°N and 70.28° E-73.71°E and covers an area of 19,638 km 2 .This area is a mountainous area, and the elevation ranges from 1294 m to 7198 m (Figure 1).The drainage area includes land cover types such as forest (3.64%), pasture (3.04%), agricultural land (0.17%), snow and ice (15.53%), bare land (13.39%) and sparsely vegetated area (64.23%) (Figure 1).The main soil types in this region are sandy soil (29.32%), mollic leptosols (21.27%), cumulic anthrosols (19.23%), haplic kastanozems (17.03%) and calcic chernozems (13.14%).In general, the climate in this region exhibits continental and subtropical features.The average annual temperature ranges from −7.7 °C to 8.3 °C, and the annual average precipitation is 739 mm (in the period of 1965-2007).Within the mountain ranges, the climate differs across different elevation bands.In this study, the elevations of the Lyairun, Daraut-Kurgan, Sarytash and Fedchenko Glacier stations are 2008 m, 2470 m, 3153 m, and 4169 m, respectively.The temperature decreases with increasing elevation, whereas precipitation presents different trends in different elevation bands and in different aspects.More than 80% of the precipitation occurs from October to May of the following year (Figure 2).The maximum and minimum precipitation totals occur in May and August, whereas the peak and low flows occur in August and March, respectively (Figure 2).The main water resources of this region are precipitation, snowmelt and glacier melt.

Climate Data Sources
Climate data sources are primarily divided into point-based (weather stations) and grid-based sources, such as atmospheric model analysis or reanalysis outputs, radar data and gridded station observations, i.e., observations that have been interpolated to a regular grid.To identify alternative data sources for use in long-term hydrological modeling, available climate data were collected from different sources (Table 1).
Three types of WSD can be used in the ADRB.These include CATPD at the monthly time scale, the Global Summary Of the Day (GSOD) and the Global Historical Climatology Network-Daily (GHCND).The latter two data sets can be obtained from The National Climatic Data Center for everywhere in the world.Although all of these data sets provided precipitation, maximum temperature and minimum temperature data, the GSOD and GHCND do not provide data in our study area until 1973, and no data exists between 1994 and 2005.In addition, there is a lot of missing data for the available periods.The CATPD from 1965 to 1990 was selected for this study due to its completeness.These data provided data from four weather stations (shown in Figure 1) in our study area.The CATPD can only be used to evaluate the accuracy of grid-based data because the SWAT model requires weather data at the daily scale for modeling river flow.
A meteorological reanalysis is a meteorological data assimilation project, which aims to assimilate historical observational data spanning an extended period using a single consistent assimilation (or "analysis") scheme throughout.The reanalysis data sets listed in Table 1 can be used in hydrological modeling.However, different types of data sets have different spatial and temporal resolutions and varying accuracy levels across different regions because of the variety of data sources and assimilation methods used.In this paper, according to the spatial and temporal resolutions of the data, the PGMFD and CRUNCEP reanalysis data sets were chosen for hydrological modeling, and they were evaluated and tested further in Sections 4.1 and 4.3.Both of them provided daily precipitation, maximum and minimum temperature, daily total solar radiation, daily average relative humidity and daily average wind speed for SWAT model construction.However, only daily precipitation, maximum temperature and minimum temperature were tested,

Climate Data Sources
Climate data sources are primarily divided into point-based (weather stations) and grid-based sources, such as atmospheric model analysis or reanalysis outputs, radar data and gridded station observations, i.e., observations that have been interpolated to a regular grid.To identify alternative data sources for use in long-term hydrological modeling, available climate data were collected from different sources (Table 1).
Three types of WSD can be used in the ADRB.These include CATPD at the monthly time scale, the Global Summary Of the Day (GSOD) and the Global Historical Climatology Network-Daily (GHCND).The latter two data sets can be obtained from The National Climatic Data Center for everywhere in the world.Although all of these data sets provided precipitation, maximum temperature and minimum temperature data, the GSOD and GHCND do not provide data in our study area until 1973, and no data exists between 1994 and 2005.In addition, there is a lot of missing data for the available periods.The CATPD from 1965 to 1990 was selected for this study due to its completeness.These data provided data from four weather stations (shown in Figure 1) in our study area.The CATPD can only be used to evaluate the accuracy of grid-based data because the SWAT model requires weather data at the daily scale for modeling river flow.
A meteorological reanalysis is a meteorological data assimilation project, which aims to assimilate historical observational data spanning an extended period using a single consistent assimilation (or "analysis") scheme throughout.The reanalysis data sets listed in Table 1 can be used in hydrological modeling.However, different types of data sets have different spatial and temporal resolutions and varying accuracy levels across different regions because of the variety of data sources and assimilation methods used.In this paper, according to the spatial and temporal resolutions of the data, the PGMFD and CRUNCEP reanalysis data sets were chosen for hydrological modeling, and they were evaluated and tested further in Sections 4.1 and 4.3.Both of them provided daily precipitation, maximum and minimum temperature, daily total solar radiation, daily average relative humidity and daily average wind speed for SWAT model construction.However, only daily precipitation, maximum temperature and minimum temperature were tested, and other parameters were simulated using the SWAT weather generator due to the lack of measured relative humidity, wind speed and solar radiation data.There are seven types of gridded data sets, as shown in Table 1.Only the APHRODITE data has the high spatial and temporal resolutions needed in this analysis.Therefore, the APHRODITE data set was selected as an alternative data source for hydrological modeling.

Other Data for Model Construction
A SWAT model requires spatial data such as a digital elevation model (DEM), a land use/cover map and a soil map.The following were used to construct the SWAT model: a DEM with a 90 m resolution [42]; land use/cover maps from the 1970s and 2005 with a 1000 m resolution, and the Harmonized World Soil Database (HWSD) soil map with a scale of 1:5,000,000 [43].The land use/cover maps were obtained from the Central Asia land cover change data set of the "973 Program", describing the response of large-scale land use/cover change to global climate change.
In addition to the spatial data and daily weather data mentioned in Section 2.2.1, a SWAT model also requires physical and chemical soil properties such as moist bulk density, depth from the soil surface to the bottom of the soil, clay content, silt content and sand content.River flow data on a certain time scale were required for model calibration and validation.
The HWSD provides soil properties such as the depths of soil layers, clay content, silt content, sand content and so on for each soil layer.Other properties, such as the available water capacity and saturated hydrologic conductivity, were calculated using Soil-Plant-Air-Water (SPAW) software developed by the U.S. Department of Agriculture.Monthly average river flow data from 1965 to 1978 and 1979-1985 from the Global Runoff Data Center (GRDC) were used for calibration and validation.

Accuracy Assessments of the Grid-Based Data Sets
Precipitation and temperature data from 1965 to 2007 were extracted from the grid-based data sets corresponding to the four weather stations using the nearest neighbor interpolation method.There are other interpolation methods such as bilinear interpolation, inverse distance-weighted method [44,45].The nearest neighbor method is the most simple method to extract point values from raster.In order to save the computing power, the nearest neighbor method was used in this study.In further study the impact of different interpolation methods on the extracted data accuracy will be discussed.An accuracy assessment was conducted by comparing the annual cycles and statistical box plots of grid-based data sets with WSD based on indicator criteria.The annual cycle is useful for evaluating the seasons throughout the year, and it is normally estimated from observational data or model output by taking the average of each month for a given number of years [46].A box plot with median, upper quartile (75th percentile), lower quartile (25th percentile), minimum and maximum values is used to display the quartile distribution of the data.
The following indicator criteria were applied to evaluate the grid-based data sets based on WSD: the linear correlation coefficient (CF), root mean square error (RMSE), mean absolute error (MAE), multiplicative bias (MBias) and Nash-Sutcliffe Coefficient (NSE) [32,45,47].The mathematical expressions of these criteria are as follows: where x and y are the gridded and stationary data sets (WSD), respectively.The CF is used to assess the agreement between the grid-based data set and the WSD.The range of CF values is between −1 and +1.A CF value of exactly +1 indicates a perfect positive fit, while a value of exactly −1 indicates a perfect negative fit.The MAE was used to represent the average magnitude of the error.The RMSE, which assigns a larger weight to larger errors relative to the MAE, was used to measure the average error magnitude.The optimal values of the RMSE and MAE are 0. The MBias is the ratio of grid-based data to WSD.A perfect estimation would result in an MBias value of 1. Underestimation will lead to values less than 1 and overestimation to values greater than 1 [48].NSE was used to describe the goodness of fit of the gridded data sets and the observed data set.The range of NSE is −∞~1, with 1 being the best value.

Data Correction and Combinations
Due to the lack of daily WSD, the APHRODITE precipitation data and the maximum and minimum temperatures of the PGMFD were selected and corrected for model construction.The simple and widely used linear bias correction [49] method was used to correct the precipitation and temperature data.The APHRODITE daily precipitation amounts P are transformed into P* such that P* = aP.The variable a is a scaling parameter equal to O/P, where O and P are monthly mean values of precipitation based on WSD and APHRODITE data, respectively.The monthly scaling factor is applied to each uncorrected daily time series.The maximum and minimum temperatures of PGMFD were also corrected using the linear bias correction method.The scaling parameter for temperature is b = O t − T, where O t and T are monthly mean WSD and PGMFD maximum or minimum temperatures.The monthly scaling factor is applied to each uncorrected time series.For daily time series from 1991 to 2007, for which no observed data were used for correction, the long-term average monthly mean correction factors were applied to uncorrected daily time series of each month.
To investigate the performances of corrected data combination and non-corrected precipitation and temperature data combinations from multiple sources on simulating river flow in the study area, the following combinations were used.CAP is the combination of corrected APHRODITE precipitation and PGMFD temperature data.The six combinations include the combination of non-corrected APHRODITE and PGMFD temperature data (AP), the combination of precipitation and temperature from the PGMFD data set (PP), the combination of CRUNCEP precipitation and PGMFD temperature data (NP), the combination of APHRODITE and CRUNCEP temperature data (AN), the combination of PGMFD precipitation and CRUNCEP temperature data (PN) and the combination of precipitation and temperature from the CRUNCEP data set (NN).The AP, PP and NP models were used to evaluate the suitability of the precipitation dada to the model and the model's sensitivity to the accuracy of the precipitation data.The AN, PN and NN models were used to analyze the suitability of temperature data and the sensitivity of the model to the accuracy of temperature data.

The SWAT Hydrological Model
The SWAT model is a physically based, temporally continuous, semi-distributed hydrology model that can operate at a daily time step.It can simulate complex hydrological processes and predict the impacts of climate change and land management practices on water, sediment, and agriculture chemical yields in large, complex watersheds with varying soils, land uses, and management conditions over long periods [50,51].It runs on a daily time step and requires specific information regarding weather, soil properties, topography, vegetation and land management practices [52].As a semi-distributed hydrological model, SWAT possesses a simpler structure and requires less data than the fully distributed MIKE SHE model.However, the model structure uncertainty inherent in the conceptual lumped model will significantly impact the prediction results [53].In addition, the conceptual lumped model cannot specifically analyze hydrological processes such as the spatial and temporal variability associated with snow and the impacts of soil moisture on irrigation.Thus, the main objective of this paper is to investigate climate data sources for the SWAT model and provide a theoretical basis for further analysis of hydrological processes and other topics.
The Soil Conservation Service (SCS) curve number procedure [54] and the Green and Ampt infiltration method are included in the SWAT model.Although the Green and Ampt infiltration model is more physically based than the SCS model, the Green and Ampt infiltration model requires less readily available sub-daily precipitation records and detailed soil information.This is a large obstacle for using this model in data scarce regions [55].Thus, the SCS method is applied in this research.There are two alternative functions of SCS method (antecedent soil moisture and plant evapotranspiration).In this study, the antecedent soil moisture method was used because of its suitability in semi-humid and humid regions (this study area belong to the semi-humid region with annual precipitation of 739 mm) [56][57][58][59][60].The model offers three options for estimating potential evapotranspiration: the Hargreaves [61], Priestley-Taylor [62] and Penman-Monteith methods [63].The three PET methods included in SWAT vary in the number of required inputs.The Hargreaves method requires only maximum, minimum and average air temperature, while the Priestley-Taylor method requires solar radiation, air temperature and relative humidity.The inputs for Penman-Monteith method are the same as those for the Priestley-Taylor method, but it also requires wind speed.The Hargreaves method is applied in this study because of meteorological data limitations.
In SWAT, a watershed is divided into multiple sub-watersheds.Then, these watersheds are divided into homogeneous spatial units with similar geomorphologic and hydrologic properties, namely, hydrologic response units (HRUs) [64].In this study, the basin was divided into 52 sub-basins and 361 HRUs.

Model Calibration and Validation
The SWAT model was run at a monthly scale in this study because observed daily runoff data were not available for model calibration and validation.The calibration period was from 1965 to 1978, and the first two years were the warm-up period.The validation period was from 1979 to 1985.The NSE and CF between simulated and observed flows [65] were used to evaluate the results of the model.The ranges of the criteria for very good, good, satisfactory and unsatisfactory results were based on those proposed by Bressiani et al. [26].Because of the lack of daily weather station data, the APHRODITE precipitation data and the maximum and minimum temperatures of the PGMFD were used to calibrate and validate the model after bias correction (selected according to the accuracy assessment in Section 4.1).
Twenty sensitive parameters were selected according to previous studies [5,47,66,67] and tested in the SWAT-CUP to perform a sensitivity analysis.Fourteen sensitive parameters were selected according to their performances in the sensitivity analysis, and manual calibration and auto-calibration were performed using a Sequential Uncertainty Fitting (SUFI-2) algorithm to achieve acceptable performance [68][69][70][71][72][73].The topographic effects were also considered by dividing the watershed into 10 elevation bands and correcting the data using the temperature lapse rate (TLAPS) and the precipitation lapse rate (PLAPS).

Evaluation of Data Accuracy
The pattern of the annual cycle of precipitation shows that the study area received the maximum amount of rainfall in the spring (Figure 3).The APHRODITE data pattern was similar to the distribution pattern of WSD.Both the CRUNCEP and PGMFD data sets overestimated precipitation in the spring and winter at Daraut-Kurgan station, and the trend at Sarytash station was almost the same.However, the CRUNCEP and PGMFD data sets underestimated precipitation in the spring and winter at Fedchenko Glacier, and the trend at Lyairun station was almost the same.The annual cycle of maximum and minimum temperature (Figure 3) indicated that the PGMFD underestimated the temperature and the CRUNCEP data set overestimated the temperature at three stations, excluding Fedchenko Glacier station.The PGMFD data pattern more closely mimicked the distribution pattern of weather station data.
The average monthly values of 25-year precipitation and maximum and minimum temperature box plots at the four stations are plotted in Figures 4 and 5 based on the WSD and gridded data sets.The inter-quartile range of the gridded data sets illustrates that the APHRODITE data provided the best performance.The APHRODITE data span almost the same range as the WSD at Sarytash and Daraut-Kurgan stations.The ranges of the APHRODITE data at the other stations are slightly narrower than the weather station data.However, the differences are very small.The CRUNCEP precipitation data performed better than the PGMFD precipitation data.In terms of the annual average precipitation in the watershed, all of the gridded data sets underestimated precipitation.The APHRODITE data set had the highest value, followed by the PGMFD and CRUNCEP data sets.The maximum and minimum temperature ranges of the PGMFD were closer to the WSD than were those of the CRUNCEP data (Figure 5).Therefore, we conclude that the APHRODITE data and PGMFD data are optimal for forcing the hydrologic model.In this study, this data was used to calibrate and validate the model after correction.The performance of corrected data combination and non-corrected data combinations were analyzed in Sections 4.2 and 4.3.Table 2 presents the comparative statistics at four stations on a monthly scale from 1965 to 1990.The APHRODITE precipitation data provided the highest accuracy, followed by the CRUNCEP precipitation data set.The CF of the APHRODITE precipitation data at four stations was higher than 0.89, whereas the CF values of the CRUNCEP and PGMF precipitation data were higher than 0.50 and 0.46, respectively, except at Daraut-Kurgan station.The MAE of the APHRODITE monthly precipitation data was lower than 30 mm, whereas the MAEs of the CRUNCEP and PGMFD data sets were 16.92-41.56mm and 22.64-47.25 mm, respectively.The CF and NSE values of the PGMFD and CRUNCEP temperature data were higher than 0.9 and 0.57, respectively, based on the WSD, and the RMSE, MAE and MBias of the PGMFD data were lower than those of the CRUNCEP data.Overall, the PGMFD temperature data exhibited a higher accuracy than the CRUNCEP temperature data.Table 2 presents the comparative statistics at four stations on a monthly scale from 1965 to 1990.The APHRODITE precipitation data provided the highest accuracy, followed by the CRUNCEP precipitation data set.The CF of the APHRODITE precipitation data at four stations was higher than 0.89, whereas the CF values of the CRUNCEP and PGMF precipitation data were higher than 0.50 and 0.46, respectively, except at Daraut-Kurgan station.The MAE of the APHRODITE monthly precipitation data was lower than 30 mm, whereas the MAEs of the CRUNCEP and PGMFD data sets were 16.92-41.56mm and 22.64-47.25 mm, respectively.The CF and NSE values of the PGMFD and CRUNCEP temperature data were higher than 0.9 and 0.57, respectively, based on the WSD, and the RMSE, MAE and MBias of the PGMFD data were lower than those of the CRUNCEP data.Overall, the PGMFD temperature data exhibited a higher accuracy than the CRUNCEP temperature data.Table 2 presents the comparative statistics at four stations on a monthly scale from 1965 to 1990.The APHRODITE precipitation data provided the highest accuracy, followed by the CRUNCEP precipitation data set.The CF of the APHRODITE precipitation data at four stations was higher than 0.89, whereas the CF values of the CRUNCEP and PGMF precipitation data were higher than 0.50 and 0.46, respectively, except at Daraut-Kurgan station.The MAE of the APHRODITE monthly precipitation data was lower than 30 mm, whereas the MAEs of the CRUNCEP and PGMFD data sets were 16.92-41.56mm and 22.64-47.25 mm, respectively.The CF and NSE values of the PGMFD and CRUNCEP temperature data were higher than 0.9 and 0.57, respectively, based on the WSD, and the RMSE, MAE and MBias of the PGMFD data were lower than those of the CRUNCEP data.Overall, the PGMFD temperature data exhibited a higher accuracy than the CRUNCEP temperature data.

Modeling River Flow Using Corrected Data
The SWAT model was calibrated and validated using corrected precipitation data from APHRODITE and the maximum and minimum temperatures from the PGMFD data set.The CF and NSE of the model increased from 0.95 and 0.85 to 0.96 and 0.92, respectively, in the calibration process and increased from 0.92 and 0.77 to 0.93 and 0.83 in the validation process after correction.The CF and NSE when the topographic effects were not considered were 0.78 and 0.48, respectively, indicating that correction based on TLAPS and PLAPS using elevation bands can significantly improve the accuracy.The major parameters identified by the sensitivity analysis and their degrees of sensitivity are shown in Table 3.The t-statistic provides a measure of sensitivity (larger absolute values are more sensitive).ALPHA_BF.gw is the most sensitive parameter for simulating runoff in this region.Figure 6 shows the observed and simulated flows at Garm gauging station in the calibration and validation periods.The model predicted the peak flow very well, and the simulated low flow was lower than the observed low flow.

Modeling River Flow Using Different Data Combinations
The six combinations of three precipitation data sets and two temperature data sets were entered into the calibrated model to investigate their performance.The six models were denoted by AP, PP, NP, AN, PN and NN, as discussed in Section 3.2.The NSE and CF in the validation period are shown in Table 4.  7 compares the results of the seven models.The results of the CAP model are significantly better than those of the AP model.The results of the PP and NP models were almost the same, which is expected because little difference exists between the precipitation data from the PGMFD and CRUNCEP data sets.The accuracies of the AN, PN and NN models were significantly lower than those of the first four models.The performance of the AN model was satisfactory, whereas the PN and NN models exhibited poor performance.The results of the first three models (AP, PP and NP) indicated that of the three types of precipitation data, the APHRODITE data performed the best, followed by the CRUNCEP and PGMFD data.This result can be attributed to the accuracy of the CRUNCEP precipitation data being higher than that of the PGMFD precipitation data.The comparison of the results of the first and second three models indicates that the simulation accuracy sharply decreased after using the CRUNCEP temperature data.The river flow in this study area was more sensitive to temperature than to precipitation, which can be attributed to the fact that nearly 80% of the water resources in the area above the Nurek Reservoir are generated by snow and glacial melt [8], and the accuracy of the temperature data from the CRUNCEP data set was lower than that of the temperature data from the PGMFD.

Modeling River Flow Using Different Data Combinations
The six combinations of three precipitation data sets and two temperature data sets were entered into the calibrated model to investigate their performance.The six models were denoted by AP, PP, NP, AN, PN and NN, as discussed in Section 3.2.The NSE and CF in the validation period are shown in Table 4.The comparison of the performances of the corrected (CAP) and non-corrected data (AP) indicates that the simulation accuracy of the model significantly increased after correction.In the six non-corrected models, The AP model produced very good results, with NSE = 0.77 and CF = 0.92.The PP and NP models also produced good results, with NSE ≥ 0.70 and CF ≥ 0.89, whereas AN produced satisfactory results, with NSE = 0.58 and CF = 0.8, and PN and NN produced poor results, with NSE < 0.50.Therefore, any of the aforementioned combinations of precipitation and temperature except PN or NN can be used to model the river flow in this region if there are no observed meteorological data available for hydrologic modeling.Figure 7 compares the results of the seven models.The results of the CAP model are significantly better than those of the AP model.The results of the PP and NP models were almost the same, which is expected because little difference exists between the precipitation data from the PGMFD and CRUNCEP data sets.The accuracies of the AN, PN and NN models were significantly lower than those of the first four models.The performance of the AN model was satisfactory, whereas the PN and NN models exhibited poor performance.The results of the first three models (AP, PP and NP) indicated that of the three types of precipitation data, the APHRODITE data performed the best, followed by the CRUNCEP and PGMFD data.This result can be attributed to the accuracy of the CRUNCEP precipitation data being higher than that of the PGMFD precipitation data.The comparison of the results of the first and second three models indicates that the simulation accuracy sharply decreased after using the CRUNCEP temperature data.The river flow in this study area was more sensitive to temperature than to precipitation, which can be attributed to the fact that nearly 80% of the water resources in the area above the Nurek Reservoir are generated by snow and glacial melt [8], and the accuracy of the temperature data from the CRUNCEP data set was lower than that of the temperature data from the PGMFD.results of the first and second three models indicates that the simulation accuracy sharply decreased after using the CRUNCEP temperature data.The river flow in this study area was more sensitive to temperature than to precipitation, which can be attributed to the fact that nearly 80% of the water resources in the area above the Nurek Reservoir are generated by snow and glacial melt [8], and the accuracy of the temperature data from the CRUNCEP data set was lower than that of the temperature data from the PGMFD.

Discussion
The results of the accuracy assessments indicated that APHRODITE exhibited the best performance.The accuracy of CRUNCEP precipitation data was lower than APHRODITE and higher than the PGMFD precipitation data.This may because the relatively coarser spatial resolution of CRUNCEP and PGMFD data and the different assimilation methods used.The correlation coefficients of the grid-based precipitation data with the WSD were higher than 0.5.The maximum and minimum temperatures at all the stations are satisfactory according to the criteria.
According to the runoff simulation performance of the combinations, the performance of the corrected data combination (CAP) was significantly better than the non-corrected combinations.This indicated that the linear bias correction can efficiently improve the simulation accuracy.In addition, weather data correction using TLAPS and PLAPS also can improve the simulation

Discussion
The results of the accuracy assessments indicated that APHRODITE exhibited the best performance.The accuracy of CRUNCEP precipitation data was lower than APHRODITE and higher than the PGMFD precipitation data.This may because the relatively coarser spatial resolution of CRUNCEP and PGMFD data and the different assimilation methods used.The correlation coefficients of the grid-based precipitation data with the WSD were higher than 0.5.The maximum and minimum temperatures at all the stations are satisfactory according to the criteria.
According to the runoff simulation performance of the combinations, the performance of the corrected data combination (CAP) was significantly better than the non-corrected combinations.This indicated that the linear bias correction can efficiently improve the simulation accuracy.In addition, weather data correction using TLAPS and PLAPS also can improve the simulation accuracy.This is because TLAPS and PLAPS corrected the errors caused by the topography.The model CAP predicted the peak flow very well, and the simulated low flow was lower than the observed low flow.These results can be explained by the fact that the PGMFD temperature data that we used to calibrate and validate the model have lower values than the observed records in high elevation areas, as is described in Section 4.1.The lower temperatures in high-elevation regions reduce the conductivity of the soil [74].
In the non-corrected combinations, the AP produced a very good result, PP and NP produced good results, AN produced a satisfactory result, and PN and NN produced poor results.Although APHRODITE exhibited the best performance, the available period of these data was from 1951 to 2007, and they only covered monsoon Asia.Therefore, these data are only appropriate for use in studies in monsoon Asia.According to the above results, the PGMFD and CRUNCEP precipitation data can be used for long-term simulations of stream flow if APHRODITE data are not available.Among the temperature data, the PGMFD data performed better than the CRUNCEP data.In all, the combinations of PGMFD temperature data and any of the three precipitation data can get good results.The combination of APHRODITE precipitation and CURNCEP temperature data can achieve satisfactory results.
Vu and Liong [47] evaluated the daily rainfall products of APHRODITE, TRMM, GPCP, PERSIANN, GHCN2 and NCEP/NCAR from 2001 to 2005.The evaluation criteria associated with APHRODITE data when compared to WSD were lower in their study than in this study.This result is potentially because the accuracy of APHRODITE in 2001-2005 was lower than in earlier years (in this study, we used only observed data from 1965 to 1990 because of data limitations), or because the data had different accuracies in different regions.This hypothesis will be tested in further studies incorporating the Global Summary Of the Day (GSOD), Global Historical Climatology Network (GHCN) and meteorological data from METAR (Météorologique Aviation Régulière).Yang et al. [65] also evaluated the performance of APHRODITE in simulating runoff in the Three Gorges Reservoir, China, from 2002 to 2006.Their model performed better in river basins with flat topography than in river basins with significant variations in elevation.In this research, the effect of topography was corrected using TLAPS and PLAPS, which improved the accuracy of the simulation.In this study, we only used meteorological data from four stations.In future studies, the impact of the spatial density of stations on the simulation of runoff will be studied by manually designing stations in each pixel of the gridded data.Regional differences in grid-based data sets will be tested by selecting different watersheds with different topographic features.In this study, the results were only given at monthly time steps because of the limitations of river flow data.If observed daily river runoff data were to become available, the river runoff could be modelled more precisely.

Conclusions
The accuracies of three sets of precipitation data and two sets of temperature data from three data sets were analyzed via a comparison to WSD from four stations.The performances of the corrected data combination and six combinations of non-corrected precipitation and temperature data from multi-sources were analyzed on simulating stream flow, and four optimal data combinations were selected in the study area.
The APHRODITE precipitation data provided the highest accuracy, followed by the CRUNCEP precipitation data.The PGMFD precipitation data provided the lowest accuracy.All of these data sets underestimated precipitation.The annual average values exhibited the following order: weather station data > APHRODITE > PGMFD > CRUNCEP.The accuracy of the PGMFD temperature data was higher than that of the CRUNCEP temperature data.The CRUNCEP temperature data overestimated the temperature, whereas the PGMFD data underestimated the temperature.The combination of the corrected APHRODITE precipitation and PGMFD temperature data performed best in terms of simulating river flow.Correction using topography also unambiguously increased the simulation accuracy.Among the non-corrected combined inputs, the AP provided the best performance, and the PN and NN provided the poorest performance in simulating river flow.The NP and PP provided good performance, and AN provided satisfactory performance in simulating river flow.Therefore, the AP is the best choice in cases where meteorological data are not available.Additionally, the combinations of NP and PP can be used to simulate river flow in the upstream portion of the ADRB.
7 • C to 8.3 • C, and the annual average precipitation is 739 mm (in the period of 1965-2007).Within the mountain ranges, the climate differs across different elevation bands.In this study, the elevations of the Lyairun, Daraut-Kurgan, Sarytash and Fedchenko Glacier stations are 2008 m, 2470 m, 3153 m, and 4169 m, respectively.The temperature decreases with increasing elevation, whereas precipitation presents different trends in different elevation bands and in different aspects.

Figure 1 .
Figure 1.Location of the study area in the Amu Darya River Basin and the land cover map.

Figure 1 .
Figure 1.Location of the study area in the Amu Darya River Basin and the land cover map.

Figure 2 .
Figure 2. Long-term monthly averages of precipitation and temperature at four stations (a) long-term monthly averages of precipitation and temperature at four stations; (b) long-term monthly averages of stream flow and precipitation (Ly, Dk, Sa, and FG are Lyairun, Daraut-Kurgan, Sarytash and Fedchenko Glacier stations, respectively).

Figure 2 .
Figure 2. Long-term monthly averages of precipitation and temperature at four stations (a) long-term monthly averages of precipitation and temperature at four stations; (b) long-term monthly averages of stream flow and precipitation (Ly, Dk, Sa, and FG are Lyairun, Daraut-Kurgan, Sarytash and Fedchenko Glacier stations, respectively).

Figure 3 .
Figure 3. Annual cycles of temperature and precipitation based on the WSD and gridded data sets.(a) Annual cycle of precipitation at Daraut-Kurgan station; (b) Annual cycle of precipitation at Fedchenko Glacier station; (c) Annual cycles of maximum (Tmax) and minimum (Tmin) temperature at Sarytash station; (d) Annual cycles of Tmax and Tmin at Daraut-Kurgan station; (e) Annual cycles of Tmax and Tmin at Lyairun station; (f) Annual cycles of Tmax and Tmin at Fedchenko Glacier station.

Figure 3 . 18 Figure 3 .Figure 4 .Figure 5 .
Figure 3. Annual cycles of temperature and precipitation based on the WSD and gridded data sets.(a) Annual cycle of precipitation at Daraut-Kurgan station; (b) Annual cycle of precipitation at Fedchenko Glacier station; (c) Annual cycles of maximum (Tmax) and minimum (Tmin) temperature at Sarytash station; (d) Annual cycles of Tmax and Tmin at Daraut-Kurgan station; (e) Annual cycles of Tmax and Tmin at Lyairun station; (f) Annual cycles of Tmax and Tmin at Fedchenko Glacier station.

Figure 4 .Figure 4 .Figure 5 .
Figure 4. Box plots of precipitation based on WSD and gridded data sets at four stations.(a) box plot of precipitation at Lyairun station; (b) box plot of precipitation at Fedchenko Glacier station; (c) box plot of precipitation at Sarytash station; (d) box plot of precipitation at Daraut-Kurgan station.

Figure 5 .
Figure 5. Box plots of temperature based on WSD and gridded data sets at four stations.(a) box plot of temperature at Sarytash station; (b) box plot of temperature at Daraut-Kurgan station; (c) box plot of temperature at Lyairun station (d) box plot of temperature at Fedchenko Glacier station.

Figure 7 .
Figure 7.Comparison of simulated flows from seven models.(a) Scatter plot and fitting curves of measured and simulated flows by model CAP and AP; (b) Scatter plot and fitting curve of measured and simulated flows by model pp; (c) Scatter plot and fitting curves of measured and simulated flows by model NP and AN; (d) Scatter plot and fitting curves of measured and simulated flows by model PNand NN.

Figure 7 .
Figure 7.Comparison of simulated flows from seven models.(a) Scatter plot and fitting curves of measured and simulated flows by model CAP and AP; (b) Scatter plot and fitting curve of measured and simulated flows by model pp; (c) Scatter plot and fitting curves of measured and simulated flows by model NP and AN; (d) Scatter plot and fitting curves of measured and simulated flows by model PNand NN.

Table 1 .
Sources of climate data.

Table 2 .
Comparative statistics of precipitation and temperature data from different sources at four stations.

Table 3 .
Descriptions of the sensitive parameters and their degrees of sensitivity.

Table 4 .
NSE and CF in the validation period for the seven models.Therefore, any of the aforementioned combinations of precipitation and temperature except PN or NN can be used to model the river flow in this region if there are no observed meteorological data available for hydrologic modeling.Figure

Table 4 .
NSE and CF in the validation period for the seven models.