A Merging Framework for Rainfall Estimation at High Spatiotemporal Resolution for Distributed Hydrological Modeling in a Data-Scarce Area

Merging satellite and rain gauge data by combining accurate quantitative rainfall from stations with spatial continuous information from remote sensing observations provides a practical method of estimating rainfall. However, generating high spatiotemporal rainfall fields for catchment-distributed hydrological modeling is a problem when only a sparse rain gauge network and coarse spatial resolution of satellite data are available. The objective of the study is to present a satellite and rain gauge data-merging framework adapting for coarse resolution and data-sparse designs. In the framework, a statistical spatial downscaling method based on the relationships among precipitation, topographical features, and weather conditions was used to downscale the 0.25 ̋ daily rainfall field derived from the Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA) precipitation product version 7. The nonparametric merging technique of double kernel smoothing, adapting for data-sparse design, was combined with the global optimization method of shuffled complex evolution, to merge the downscaled TRMM and gauged rainfall with minimum cross-validation error. An indicator field representing the presence and absence of rainfall was generated using the indicator kriging technique and applied to the previously merged result to consider the spatial intermittency of daily rainfall. The framework was applied to estimate daily precipitation at a 1 km resolution in the Qinghai Lake Basin, a data-scarce area in the northeast of the Qinghai-Tibet Plateau. The final estimates not only captured the spatial pattern of daily and annual precipitation with a relatively small estimation error, but also performed very well in stream flow simulation when applied to force the geomorphology-based hydrological model (GBHM). The proposed framework thus appears feasible for rainfall estimation at high spatiotemporal resolution in data-scarce areas.


Introduction
Accurate estimation of the amount, and the spatial and temporal distribution, of precipitation is crucial for hydrological analysis and flood forecasting, particularly because distributed hydrological modeling has emerged as an effective method of analyzing hydrological processes, predicting the evolution of hydrological variables, and forecasting hydrological hazards [1,2].A handful of studies [3][4][5][6] have demonstrated that precipitation levels are the main area of uncertainty in hydrological model predictions.Achieving reliable modeling outputs thus requires forcing hydrological models with accurate rainfall at relative fine spatial and temporal resolutions.
Mapping rainfall is always challenging.Interpolation of point rainfall measured by rain gauges is the conventional method (for example, with Thiessen polygons, inverse distance weighting, or kriging techniques), but this may be subject to great uncertainty when the rain gauge network is sparse, as the gauges can only represent rainfall information within a limited distance [7].In contrast, remote sensing techniques provide an evolutionary method of spatial continuous rainfall observation with a high temporal sampling frequency.However, remote sensing products may also generate major quantitative errors, due to cloud effects and limitations in remote sensor performance and retrieval algorithms [8].Combining both data sources, known as data merging may, thus, be effective in maintaining both high-quality rainfall data from stations and spatially-continuous information from remote sensing observations [9].Great efforts have been made to develop and evaluate algorithms for merging rain gauge and remote sensing observations, e.g., co-kriging [10,11], linearized weighting procedures [8,12], conditional merging [13], Barnes objective analysis [14], multi-quadric surface fitting [9], and double kernel smoothing [15].The outcomes of these studies are encouraging, and provide new methods of estimating spatiotemporal precipitation, particularly in areas with limited climate data, such as the Qinghai-Tibetan Plateau.
However, it is quite a challenging job to generate high spatial and temporal rainfall maps for distributed hydrological modeling at a local basin scale when facing the following problems all at once: (1) only a sparse rain gauge network is available; (2) the spatial resolution of satellite data is much coarser than the modeled one [7]; and (3) daily rainfall is spatially intermittent [11].This study thus aims to present a methodology that overcomes these issues.
The kriging-based merging scheme is a common and mature spatial prediction method, but requires the assumption of a second-order stationary and a theoretical semi-variogram model.In poorly-gauged areas kriging-derived methods may, thus, overestimate the spatial correlation, as distances between rain gauges are often too large and, hence, tend to deliver unsatisfactory results [16].Instead, nonparametric merging methods without strong spatial assumptions may be more suitable for sparse designs.Li and Shao [15] used the nonparametric double kernel smoothing technique to combine TRMM precipitation data with observations from the Australian rain gauge network, focusing on discontinuity correction and spatial interpolation adapting for sparse design, and compared this to the geostatistical methods of ordinary kriging and co-kriging.Nerini et al. [16] compared the nonparametric rainfall methods of double kernel smoothing and mean bias correction with two geostatistical methods-kriging with external drift and Bayesian combination-for merging daily TRMM precipitation with rain gauge data over a mesoscale tropical Andean catchment in Peru.Both studies concluded that the nonparametric double kernel smoothing merging method performed better than the more complex geostatistical methods under data-scarce conditions.
Nonetheless, the precipitation with finer spatial resolution necessary for hydrological models is still not present, as the merged results often retain the same scale as the satellite data.Spatial downscaling of the satellite observations is thus necessary before the merging, which is a technique for disaggregating coarse-resolution data and capturing the sub-grid heterogeneity.It is usually based on the concept of scale invariance, or relates the properties of the physical process at one scale to those at a finer scale [17].A common and simple downscaling method is to develop a statistical model at the original coarse scale, based on the relationships between rainfall and the main factors that govern the rainfall spatial variability, and then transfer the model to the target scale, such as the works of Jia et al. [18], Fang et al. [19], and Shi et al. [20].
Spatial intermittency means daily rainfall is delivered in discrete patches in space and time, which causes a discontinuous surface with areas of zero rainfall between areas of non-zero rainfall [11].However, most studies have neglected this feature, except the works of Barancourt et al. [21], Grimes and Pardo-Igúzquiza [22], and Chappell et al. [11], who tackled the discontinuity by thresholding the rainfall distribution with the indicator kriging to map the presence and absence of rainfall.
Hence, this study presents a framework for estimating precipitation with high spatial and temporal resolution for distributed hydrological modeling in the Qinghai Lake Basin, a data-scarce area in the northeast of the Qinghai-Tibet Plateau.Using the 0.25 ˝TRMM and a sparse rain gauge network, statistical spatial downscaling, double kernel smoothing merging, and indicator kriging techniques are combined, following Fang et al. [19], Li and Shao [15], and Chappell et al. [11], respectively, to solve the issues proposed previously.

Study Area and Data
2.1.1.The Qinghai Lake Basin Qinghai Lake, the largest inland saline lake in China, lies in a closed intermountain basin between 36 ˝32 1 -37 ˝15 1 N and 99 ˝36 1 -100 ˝47 1 E (Figure 1).It has a drainage area of 29,691 km 2 , and the altitude of the basin ranges from 3167 (the elevation at the bottom of the lake) to 5279 m above sea level (asl).Land cover is dominated by alpine meadow, bare soil, everglade, tundra, and alpine desert.The primary soil types include felty, thin dark felty, castanozems, peaty bog, and dark frigid calcic soils.Over 40 rivers drain into Qinghai Lake, but most are intermittent.Only the two largest rivers, the Buha and the Ike'ulan, have longstanding and continuous hydrological records.The Buha River is the largest, contributing almost half of the lake's total runoff.The basin is dominated by a cold and semi-arid climate and influenced by three different monsoon systems: the East Asian monsoon, the Indian monsoon, and westerly jet streams, which make it one of the most sensitive regions to global climate change [23].The water level of the lake decreased overall by about 3.7 m from 1959 to 2004, but increased by almost 1.5 m from 2004 to 2013 according to the observational records of the Hydrology and Water Resources Survey Bureau of Qinghai Province, China.Causes of this trend are uncertain.Therefore, a study of the hydrological processes of the basin is necessary, with the accurate estimation of precipitation levels being one prerequisite.
Remote Sens. 2016, 8, 599 3 of 18 area in the northeast of the Qinghai-Tibet Plateau.Using the 0.25° TRMM and a sparse rain gauge network, statistical spatial downscaling, double kernel smoothing merging, and indicator kriging techniques are combined, following Fang et al. [19], Li and Shao [15], and Chappell et al. [11], respectively, to solve the issues proposed previously.

Study Area and Data
2.1.1.The Qinghai Lake Basin Qinghai Lake, the largest inland saline lake in China, lies in a closed intermountain basin between 36°32′-37°15′N and 99°36′-100°47′E (Figure 1).It has a drainage area of 29,691 km 2 , and the altitude of the basin ranges from 3167 (the elevation at the bottom of the lake) to 5279 m above sea level (asl).Land cover is dominated by alpine meadow, bare soil, everglade, tundra, and alpine desert.The primary soil types include felty, thin dark felty, castanozems, peaty bog, and dark frigid calcic soils.Over 40 rivers drain into Qinghai Lake, but most are intermittent.Only the two largest rivers, the Buha and the Ike'ulan, have longstanding and continuous hydrological records.The Buha River is the largest, contributing almost half of the lake's total runoff.The basin is dominated by a cold and semi-arid climate and influenced by three different monsoon systems: the East Asian monsoon, the Indian monsoon, and westerly jet streams, which make it one of the most sensitive regions to global climate change [23].The water level of the lake decreased overall by about 3.7 m from 1959 to 2004, but increased by almost 1.5 m from 2004 to 2013 according to the observational records of the Hydrology and Water Resources Survey Bureau of Qinghai Province, China.Causes of this trend are uncertain.Therefore, a study of the hydrological processes of the basin is necessary, with the accurate estimation of precipitation levels being one prerequisite.Map of the Qinghai Lake Basin, showing the regional topography, main river network, rain gauges, and discharge station at Buhahekou.

Datasets
The rain gauge data was obtained from the China Meteorological Administration.The daily observation is interpreted as the 24-h accumulated rainfall to 8 p.m. Beijing time (UTC + 8) for a given day.Due to the cold alpine environment, most parts of the basin are off the beaten track and stations are hard to maintain; as such, there are only two national weather stations within it.Thus, data from the two stations within the basin, and eleven more stations surrounding the basin, as shown in Figure 1, were used in the merging process.Climate factors, including maximum temperature, relative humidity, and wind direction, which are required by the downscaling process, were also take from the thirteen stations.
The satellite-based product was obtained from the Tropical Rainfall Measuring Mission (TRMM) Multi-satellite Precipitation Analysis (TMPA, also TRMM 3B42) precipitation product version 7, with a spatial resolution of 0.25 ˝covering the latitude band 50 ˝N-S and with a temporal resolution of 3-h covering 1998 to the present.TRMM products have been used extensively over the Qinghai-Tibetan Plateau (e.g., [24,25]) but, to our knowledge, no merging products combine TRMM and gauge observations from the plateau, let alone products for hydrological modeling.The TMPA products have been scaled by monthly rain gauge analyses from the Global Precipitation Climatology Project (GPCP) and the Climate Assessment and Monitoring System (CAMS) [26], but gauges with restricted data accessibility cannot be used by these procedures, as is often the case in developing countries [16].Gao and Liu [27] evaluated the performance of the TMPA version 6 products at daily scale over the plateau, and found that light rainfall (0-10 mm) was overestimated and moderate and heavy rainfall (>10 mm) were underestimated.Therefore, the TMPA products still require local adjustments.These TMPA precipitation estimates were geographically abstracted for covering the spatial extent of the rain gauges.To match the time-scale of the rain gauges, the satellite-based daily precipitation was constructed by accumulating the rain falling in a 24 h period up until 8 pm Beijing time.
The Digital Elevation Model (DEM) used to derive topographical features is the Shuttle Radar Topography Mission (SRTM) version 4, with data voids filled, available at a spatial resolution of 3 arc seconds [28].For subsequent hydrological modeling and analysis, an equal area projection of the datasets is required.In this study, the coordinate systems of the weather stations, satellite data, and DEM was converted to the Albers equal area projection with GCS_Krasovsky_1940 Datum.

Merging Framework
The whole procedure can be divided into five steps (see Figure 2).First, the TRMM-derived daily precipitation was disaggregated to a 1-by-1 km field, using a regression model developed from the relationships among precipitation, topographical features, and weather conditions.Second, a function optimization method known as shuffled complex evolution (SCE), developed by Duan et al. [29], was used to automatically select two optimal bandwidths of the double kernel smoothing model.Third, the downscaled precipitation and rain gauge observations were merged, using the double kernel smoothing model with optimal bandwidths.Fourth, the final rainfall field was estimated by multiplying the merged field by an indicator field, which represents the presence and absence of rainfall, to take into account the spatial intermittency of the rainfall field.Finally, the results were examined according to a set of performance indicators and hydrological evaluation.

Spatial Downscaling of TRMM
Similar to the method of Fang et al. [19], the downscaling method used in this study is based on the assumption that the spatial variability of daily precipitation is well captured by the TRMM products and can be explained by local topography and related meteorological conditions.Three topographical factors, including elevation (ELE), the topographical ruggedness index (TRI), and the angle between the slope aspect and the prevailing wind direction (ASA), and two meteorological variables, including daily maximum air temperature (TEM) and average humidity (HUM), were considered in the construction of the regression model.These five factors were prepared at 25 km and 1 km resolutions.
The projected DEM at 100 m resolution was aggregated by the mean statistics to resolutions of 25 km and 1 km.The resampled DEMs were then used to derive the TRI and slope aspect.The TRI, developed by Riley et al. [30] is a rapid, objective measure of topographic heterogeneity, calculated as the square root of the averaged squared differences in elevation values from a center grid cell and eight neighboring cells.Areas with high TRI values indicate mountainous regions, where rainfall is expected to be higher than elsewhere [31].The prevailing wind direction for each month was determined as that most frequently observed at the thirteen weather stations.We could then derive the monthly aspect angle, i.e., the angle between the slope aspect and the prevailing wind direction.The spatiotemporal variability of temperature and humidity is not as great as that for rainfall, so the spatial distribution of the daily maximum air temperature and average humidity can be obtained through the interpolation of ground-based observations.The inverse distance weighting (IDW) method was used, as this has been proved simple and suitable for this study area [32].The method is straightforward and not computationally intensive, which estimates the unknown value at one point using a linearly weighted combination of its neighbor sample points, with the weights inversely related to the distance between the estimated and the sample point [33].
The downscaling process is divided into four main steps.First, the relationships between precipitation and the five explanatory variables mentioned above, i.e., ELE, TRI, ASA, TEM, and HUM, were examined daily at the original scale using an univariate regression analysis, and the

Spatial Downscaling of TRMM
Similar to the method of Fang et al. [19], the downscaling method used in this study is based on the assumption that the spatial variability of daily precipitation is well captured by the TRMM products and can be explained by local topography and related meteorological conditions.Three topographical factors, including elevation (ELE), the topographical ruggedness index (TRI), and the angle between the slope aspect and the prevailing wind direction (ASA), and two meteorological variables, including daily maximum air temperature (TEM) and average humidity (HUM), were considered in the construction of the regression model.These five factors were prepared at 25 km and 1 km resolutions.
The projected DEM at 100 m resolution was aggregated by the mean statistics to resolutions of 25 km and 1 km.The resampled DEMs were then used to derive the TRI and slope aspect.The TRI, developed by Riley et al. [30] is a rapid, objective measure of topographic heterogeneity, calculated as the square root of the averaged squared differences in elevation values from a center grid cell and eight neighboring cells.Areas with high TRI values indicate mountainous regions, where rainfall is expected to be higher than elsewhere [31].The prevailing wind direction for each month was determined as that most frequently observed at the thirteen weather stations.We could then derive the monthly aspect angle, i.e., the angle between the slope aspect and the prevailing wind direction.The spatiotemporal variability of temperature and humidity is not as great as that for rainfall, so the spatial distribution of the daily maximum air temperature and average humidity can be obtained through the interpolation of ground-based observations.The inverse distance weighting (IDW) method was used, as this has been proved simple and suitable for this study area [32].The method is straightforward and not computationally intensive, which estimates the unknown value at one point using a linearly weighted combination of its neighbor sample points, with the weights inversely related to the distance between the estimated and the sample point [33].
The downscaling process is divided into four main steps.First, the relationships between precipitation and the five explanatory variables mentioned above, i.e., ELE, TRI, ASA, TEM, and HUM, were examined daily at the original scale using an univariate regression analysis, and the final multivariate regression model was constructed at the 25 km resolution using the variables that passed the significance test (at the p < 0.05 level).The residuals were then calculated as the differences between the TRMM rain and the predicted values from the regression model, interpreted as the natural random variations of precipitation not represented by the model, and interpolated to 1 km resolution using the spline interpolator.Then, assuming that the same responses existed at the target resolution, the regression model was applied to the estimation of 1 km precipitation using the significant topographical and meteorological variables at the 1 km resolution.Finally, the downscaling result was calculated by combining the interpolated residuals and the predicted precipitation.

The Double Kernel Smoothing Technique
The double kernel smoothing technique (DS) [15] is a nonparametric residual-based merging method adapting to data-sparse design, which can preserve the main pattern of the satellite field if no rain gauges are locally presented, or otherwise adjust the satellite field by using nearby observations.The main aim is to estimate the residual field by a weighted average of point residuals, using kernel functions, and then adjust the satellite field by the predicted residual field.It includes four steps [15]: Estimating the point residuals at each gauge location s i : where X B , X O , and D denote the background (satellite), observed, and residual fields, respectively.

2.
Performing a first level interpolation of point residuals to generate gridded pseudo-residuals D with a grid size of 25 km: where K is a kernel function satisfying (i) K(u) ě 0; (ii) K(u) = K(´u); and (iii) K(u)du = 1, || ‚ || is the Euclidean distance, and h is the bandwidth.

3.
Applying a second level of interpolation on both the observed point residuals D and the gridded pseudo-residuals D to generate the error field µ B on each grid of the 1 km downscaled satellite data: As Gasser et al. [34] indicated, most kernel functions perform similarly so, in this study, the K 1 and K 2 kernel functions are defined as Gaussian kernels, following Nerini et al. [16]: Estimating the merged field X M by subtracting the error field µ B from the background field X B : Remote Sens. 2016, 8, 599 7 of 18

Bandwidth Estimation
The bandwidth rescales the spread of the kernel function and determines the smoothness of the estimated field [15].An optimal bandwidth balances the bias and variance.If the bandwidth is small, the estimate is largely influenced by observations close to the estimated point, and the estimated field is, hence, rough with small bias, but large variance.In contrast, as the bandwidth tends to the infinite, the estimate approaches the mean of the observations, and a large bandwidth leads to a smooth surface with small variance, but large bias.In this study, the bandwidths h 1 and h 2 in steps 2 and 3 were automatically determined using the shuffled complex evolution (SCE) method developed by Duan et al. [29], which is a global optimization strategy combining the concepts of controlled random search, competitive evolution, and complex shuffling.The objective function of the SCE method is based on cross-validation, given by: where μB,´i is the residual estimated at gauge location s i by Equation (3) without using the point residual D(s i ).

Accounting for Spatial Intermittency
Unlike monthly or annual rainfall patterns, a daily rainfall field is usually spatially intermittent; the rainfall surface is discontinuous, with areas of zero rainfall between areas of non-zero rainfall [11].Most of the merged rainfall fields will be continuous, as the Gaussian kernel in the double kernel smoothing model is a continuous function that supports infinity, which will generate a continuous residual field.Therefore, to consider rainfall spatial intermittency, an indicator field representing the presence and absence of rainfall was generated using the indicator kriging technique, and was applied to the daily DS merged results.The rainfall spatial intermittency treatment is summarized below: 1.
Convert the 25 km gridded daily TRMM rainfall values into point features and append the rain gauges with their rainfall values to the TRMM-derived point file.This ensures TRMM and gauge rainfall values are considered in the indicator field generation.

2.
Transform the rainfall values generated from the previous step to create a binary variable indicating where the rainfall value is zero (0) or nonzero (1).

3.
Under the assumption that the binary variable is stationary and autocorrelated, generate a soft indicator field at 1 km resolution, presenting the probability of rainfall occurrence by using ordinary kriging with a Gaussion model fitting its semi-variogram.

4.
Produce a hard indicator field by assigning a probability threshold to the soft indicator field (0.5 in this study).

5.
Estimate the final rainfall field by multiplying the merged rainfall field by the hard indicator field.

Performance Indicators
To assess the performance of the merging scheme, a set of performance indicators were examined, including the mean error (ME), percent bias (PBIAS), root mean square error (RMSE), and Nash-Sutcliffe efficiency (NSE), respectively defined by the following equations: RMSE " and where n is the total number of observations, p and p are the estimated and observed values at rain gauges, respectively, and the over bar denotes the mean value.The ME and RMSE indicate error in the units used in the measurement.The PBIAS measures the average tendency of the estimated data to be larger or smaller than their observed counterparts.The NSE is the relative magnitude of the residual variance compared to the measured data variance [35].If the overall performance is evaluated, n is the total number of dates and stations, i.e., 366 d (for year 2008) multiplied by 13 stations in this study.
If the performance of a particular day is evaluated, n is the total number of stations, i.e., 13 stations in this study.

Hydrological Evaluation
Assessment of the merging results was supplemented through distributed hydrological modeling.The merged precipitation product was used to force the geomorphology-based hydrological model (GBHM) [36][37][38] for stream flow simulation, which is a physically-based distributed model that implements very few empirical parameters and does not totally rely on the observed data for parameter calibration and is, thus, quite useful in data-poor environments.See Yang [38] for details of the model.The simulated daily stream flow for 2008 was compared with the observed discharge at the Buhahekou hydrometric station (see Figure 1), which is the outlet of the largest subcatchment of the Qinghai Lake Basin, and contributes almost half of the lake's inflow.The stream flow data was obtained from the Hydrology and Water Resources Survey Bureau of Qinghai Province.The goodness-of-fit was evaluated in terms of the Nash-Sutcliffe efficiency (NSE), coefficient of determination (R 2 ) [39], percent bias (PBIAS), and ratio of the root mean square error (RSR) [35] at the daily scale.Model performance can be evaluated as "very good" if 0.75 < NSE < 1.0, PBIAS < ˘10%, and 0 ď RSR ď 0.50 [35].The data and parameter sets used to setup the model are listed in Table 1.

Merging Process
The rainfall estimation on 30 July 2008, a day of heavy rain before the river reached its peak flow, is presented in this section as an example to illustrate the merging process.The derived daily TRMM rainfall at 25 km resolution is shown in Figure 3a, and indicates that precipitation was distributed in most of the study area, and was particularly heavy in the northeast and southeast.The rainfall amount was significantly underestimated by TRMM when compared to the gauged rainfall.For the downscaling process, elevation and maximum air temperature passed the initial significance test and were used to construct the multivariate regression model (Table 2).The downscaled TRMM rainfall at 1 km resolution is shown in Figure 3b, and maintains the spatial pattern of the precipitation at the original scale.The optimal bandwidths and the minimum value of the objective function achieved by the SCE optimization method are listed in Table 2.The merged rainfall from the downscaled TRMM and the gauged rainfall using the double kernel smoothing model with optimal bandwidths is shown in Figure 3c.The amount of DS merged rainfall significantly increased, the spatial averaged rainfall of the Qinghai Lake Basin was more than six times the amount of the original TRMM, and the estimation error was reduced (see Table 3).However, the spatial intermittency was not preserved, i.e., the non-rainfall area disappeared, which means the rainfall amount of these areas was probably overestimated.The DS merged rainfall field was then conditioned by the indicator field to consider rainfall spatial intermittency, by setting the rainfall amount to zero in areas with a rainfall occurrence probability of less than 0.5.The final rainfall field is shown in Figure 3d, and maintains a rainfall spatial pattern similar to the original TRMM.The evaluation statistics listed in Table 3 also indicate that the performance of the final rainfall field is slightly better than the DS merged result.
Remote Sens. 2016, 8, 599 9 of 18 test and were used to construct the multivariate regression model (Table 2).The downscaled TRMM rainfall at 1 km resolution is shown in Figure 3b, and maintains the spatial pattern of the precipitation at the original scale.The optimal bandwidths and the minimum value of the objective function achieved by the SCE optimization method are listed in Table 2.The merged rainfall from the downscaled TRMM and the gauged rainfall using the double kernel smoothing model with optimal bandwidths is shown in Figure 3c.The amount of DS merged rainfall significantly increased, the spatial averaged rainfall of the Qinghai Lake Basin was more than six times the amount of the original TRMM, and the estimation error was reduced (see Table 3).However, the spatial intermittency was not preserved, i.e., the non-rainfall area disappeared, which means the rainfall amount of these areas was probably overestimated.The DS merged rainfall field was then conditioned by the indicator field to consider rainfall spatial intermittency, by setting the rainfall amount to zero in areas with a rainfall occurrence probability of less than 0.5.The final rainfall field is shown in Figure 3d, and maintains a rainfall spatial pattern similar to the original TRMM.The evaluation statistics listed in Table 3 also indicate that the performance of the final rainfall field is slightly better than the DS merged result.

Overall Performance
A spatial analysis of the annual precipitation aggregated from daily 2008 estimates across the original 25 km TRMM, spatially-downscaled TRMM, DS merged, and the final indicator conditioned rainfall, and the PBIAS against rain gauge time series, are presented in Figure 4.The downscaled TRMM effectively captured the spatial trends of the original TRMM, showing considerably more rainfall distributed in the northeastern and southeastern parts of this area.The final product retained a higher degree of information from the original TRMM than from the DS merged product, which was reasonable given that the final merged product was conditioned by the original TRMM-derived indicator field.Although the DS merging process significantly reduced the estimation errors, the rainfall amount was overestimated at most weather stations.The final product shows a smaller estimation error in the northeast where weather station density is relatively higher, and the absolute values of PBIAS for all stations were controlled within 30%.
The daily time series of the mean error for the intermediate and final merged products are shown in Figure 5.The original TRMM shows a poor performance between May and September, the rainy period of this area, accounting for nearly 90% of the annual precipitation.For most days, the amount of precipitation was underestimated by the original TRMM.The downscaled TRMM achieved a similar performance as the original TRMM, and did not cause a higher estimation error than the original TRMM.The mean errors was significantly reduced after the DS merging process, but the rainfall amount was slightly overestimated for most days.The final estimation also shows a smaller estimation error than the original TRMM.The summary of performance scores shown in Table 4 also indicates improvements in the DS merged and final estimation product compared with the original and downscaled TRMM.From Figure 5 and Table 4, we can also observe that the final indicator conditioned estimation shows a slight decrease in performance compared with the DS merged estimation, indicating that the indicator conditioning process can introduce uncertainties.The accuracy of the final estimates is indeed directly affected by the probability threshold determining the borders between rainfall and non-rainfall areas.A practicable way to reduce the final estimation error is selecting a proper threshold and including more site observations.

Performance at the Tianjun Station
The evaluation of the performances in the previous section may have some limitations, as all of the rain gauge observations were involved in both the merging and the validation procedure.Thus, a cross-validation was supplemented.However, the DS merging is time consuming as it combines the SCE method to automatically select the optimal bandwidths.The cross-validation was, thus, just carried out at the Tianjun station, located in the heart of the basin (see Figure 1), by performing rainfall estimation at the Tianjun station without the observed value at this station.The result (see Table 5) also shows improvements in the DS merged and final estimation products.The rainfall estimation error was significantly reduced by the DS merging process, with the rainfall amount

Performance at the Tianjun Station
The evaluation of the performances in the previous section may have some limitations, as all of the rain gauge observations were involved in both the merging and the validation procedure.Thus, a cross-validation was supplemented.However, the DS merging is time consuming as it combines the SCE method to automatically select the optimal bandwidths.The cross-validation was, thus, just carried out at the Tianjun station, located in the heart of the basin (see Figure 1), by performing rainfall estimation at the Tianjun station without the observed value at this station.The result (see Table 5) also shows improvements in the DS merged and final estimation products.The rainfall estimation error was significantly reduced by the DS merging process, with the rainfall amount overestimated, and the performance decreased after the indicator conditioning process, which are consistent with that found from the previous section.Three hydrological simulations were conducted in year 2008.Simulation I was forced by the interpolated gauged rainfall using the IDW method, a default rainfall estimation method adopted by the model.Simulation II was forced by the TRMM precipitation which was directly resampled to 1 km resolution using the nearest-neighbor assignment, and simulation III forced by the final estimation product.Other model inputs and parameters retained the same in the three simulations, and the parameters were not calibrated.The hydrological modeling results are summarized in terms of the performance scores of daily stream flow simulation at the Buhahekou hydrometric station.Table 6 shows significant NSE increases, as well as PBIAS and RSR reductions by simulation III, compared with I and II.Thus, simulation III performed extremely well, as the model reproduced the stream flow accurately when forced by the final estimation rainfall product.As shown in Figure 6, an obvious peak flow can be observed on 1 August 2008, two days after the heavy rain event mentioned in Section 3.1.Except simulation II, the other two simulations succeeded in capturing the flood conditions, but simulation III reproduced the amount more accurately.Similar results could also be observed around 28 September 2008, the last flood of the year, which further indicates that, for heavy rain events, the IDW-interpolated rainfall amounts tend to be overestimated and the TRMM rainfall to be underestimated.The overestimation of heavy rainfall by IDW also explains why simulation I obtained a high R 2 , which indicates a good correlation between the simulated and observed stream flow, but low NSE and large PBIAS, which are easily influenced by outliers.From the daily stream flow simulations, it can be drawn that the original TRMM product cannot be directly applied to stream flow simulation in this study area.Interpolation based on the sparse rain gauge network is practicable if heavy rainfall events are especially considered.The proposed merging framework can be used for rainfall estimation for distributed hydrological modeling in this study area.
overestimated, and the performance decreased after the indicator conditioning process, which are consistent with that found from the previous section.Three hydrological simulations were conducted in year 2008.Simulation I was forced by the interpolated gauged rainfall using the IDW method, a default rainfall estimation method adopted by the model.Simulation II was forced by the TRMM precipitation which was directly resampled to 1 km resolution using the nearest-neighbor assignment, and simulation III forced by the final estimation product.Other model inputs and parameters retained the same in the three simulations, and the parameters were not calibrated.The hydrological modeling results are summarized in terms of the performance scores of daily stream flow simulation at the Buhahekou hydrometric station.Table 6 shows significant NSE increases, as well as PBIAS and RSR reductions by simulation III, compared with I and II.Thus, simulation III performed extremely well, as the model reproduced the stream flow accurately when forced by the final estimation rainfall product.As shown in Figure 6, an obvious peak flow can be observed on 1 August 2008, two days after the heavy rain event mentioned in Section 3.1.Except simulation II, the other two simulations succeeded in capturing the flood conditions, but simulation III reproduced the amount more accurately.Similar results could also be observed around 28 September 2008, the last flood of the year, which further indicates that, for heavy rain events, the IDW-interpolated rainfall amounts tend to be overestimated and the TRMM rainfall to be underestimated.The overestimation of heavy rainfall by IDW also explains why simulation I obtained a high R 2 , which indicates a good correlation between the simulated and observed stream flow, but low NSE and large PBIAS, which are easily influenced by outliers.From the daily stream flow simulations, it can be drawn that the original TRMM product cannot be directly applied to stream flow simulation in this study area.Interpolation based on the sparse rain gauge network is practicable if heavy rainfall events are especially considered.The proposed merging framework can be used for rainfall estimation for distributed hydrological modeling in this study area.

Discussion
The statistical spatial downscaling scheme based on the relationships among precipitation, topographical features, and weather conditions successfully represented the spatial pattern of the precipitation fields in the original TRMM data, and did not cause higher estimation errors than the original TRMM.The downscaling approach was initially designed for extreme convective rainfall events, which can have different formation mechanisms.The topographical and meteorological factors only reflect some of the environmental effects on precipitation [19].In addition, there was very little precipitation from October to March (about 10% of the annual precipitation), the downscaling approach could not cause high estimation errors during that period.The estimation errors would be further reduced as the preliminary results were then calibrated by the subsequent merging process.Thus, although it cannot perform well every day in rainfall spatial downscaling, the statistical spatial downscaling scheme is still applicable in this study area.
When compared with the original and downscaled TRMM, the double kernel smoothing merged results reduced the estimation error in terms of ME, PBIAS, RMSE, and NSE, but tended to overestimate the spatial averaged rainfall amount, particularly for heavy rains.Li and Shao [15] assigned a specified value to bandwidth h 1 and then automatically selected bandwidth h 2 , which needs some prior knowledge and does not apply to all cases.This study employed the SCE global optimization algorithm to automatically estimate the two bandwidths for each rainy day and, thus, can achieve optimal calibration of the TRMM rainfall.For most days, however, the amount of precipitation was underestimated by the original TRMM, leading to negative point residuals.The searching space of SCE was from 25 km to the length of analysis window diagonal, and the estimated bandwidths might, therefore, be larger than the influence distance of the weather stations, which would exaggerate the underestimation area.This was why the spatial averaged precipitations were overestimated.
The final indicator conditioned estimates captured the spatial pattern of daily and annual precipitation with a relatively small rainfall estimation error, and also performed very well in the stream flow simulation when forcing the GBHM model.We are, thus, able to gain insights into the spatial distribution of precipitation at a fine resolution in the Qinghai Lake Basin for the first time.The annual precipitation in the northwestern part of the Qinghai Lake was observed to be significantly less than that of the central and southeastern areas (see Figure 4), which could not be identified by the few existing weather stations.Previous studies of water balance analysis and hydrological simulation of the lake have to either calculate precipitation from the sparse weather stations [23] or directly use gridded precipitation products at coarse resolutions [44].Our resulting high spatiotemporal rainfall dataset can, therefore, be used in subsequent hydrological analysis and distributed hydrological modeling in this area.
As there has not been any rainfall data merging research in the Qinghai Lake Basin before, we can only just compare the rainfall product estimated by our merging framework with the estimates obtained by other merging techniques.Co-kriging [10], combined with indicator kriging with the same threshold, was used, by which the performance criteria obtained at the Tianjun station were ´0.49 for ME, ´49.38% for PBIAS, 2.14 for RMSE, and 0.55 for NSE.For stream flow simulation, NSE, R 2 , PBIAS, and RSR were 0.83, 0.85, ´25.53%, and 0.41, respectively.These figures show larger cross-validation error and similar performance of stream flow simulation, compared with the results obtained by the merging framework proposed in this study.Annual precipitation shows a similar spatial pattern, but are roughly varying, with short and straight fringes of rainfall amount classes (see Figure 7b).Thus, our merging framework is more adaptive than the kriging-based merging scheme for rainfall estimation in this data-limited area.
Leaving aside uncertainties about the satellite and weather station data, the merging framework contained three main sources of uncertainties in the final results: the environmental factors used in the downscaling process, the search space of the bandwidths and, particularly, the threshold determining the borders between rainfall and non-rainfall areas.On the other hand, real-time merging of rain gauge and remote sensing data has become a new perspective in hydrological forecasting [2,9,45].By far, however, the merging framework proposed in this study can be used only to back-analyze past rainfall events, which may strongly restrict its scope of application.Nonetheless, this framework has the potential to be applied in real-time, as it adapts the key parameters of the spatial downscaling and data merging algorithms for every time-step, instead of being constant for the whole period and setting at their optimal values.Thus, our work still has room for improvement, including (1) taking into account more variables related to the geophysical mechanisms of precipitation in the multivariate regression model; (2) assessing the influence ranges of rain gauges and narrowing down the searching ranges in the automated bandwidth selection process, for more accurate rainfall estimation and efficient computing; (3) finding an efficient way of determining the threshold applied to the indicator field, e.g., by adapting its value for each day and ensuring that the proportion of rainy areas in the final estimates is the same as that in the original TRMM; and (4) making the framework flexible and computationally efficient enough to be run in real-time or near real-time, by automatically adapting the key parameters of the algorithms, recoding the algorithms in a single development environment that supports high performance computing, and applying a real-time remote sensing rainfall product, such as 3B42-RT, to the TMPA product in real-time.
Remote Sens. 2016, 8, 599 15 of 18 (see Figure 7b).Thus, our merging framework is more adaptive than the kriging-based merging scheme for rainfall estimation in this data-limited area.
Leaving aside uncertainties about the satellite and weather station data, the merging framework contained three main sources of uncertainties in the final results: the environmental factors used in the downscaling process, the search space of the bandwidths and, particularly, the threshold determining the borders between rainfall and non-rainfall areas.On the other hand, real-time merging of rain gauge and remote sensing data has become a new perspective in hydrological forecasting [2,9,45].By far, however, the merging framework proposed in this study can be used only to back-analyze past rainfall events, which may strongly restrict its scope of application.Nonetheless, this framework has the potential to be applied in real-time, as it adapts the key parameters of the spatial downscaling and data merging algorithms for every time-step, instead of being constant for the whole period and setting at their optimal values.Thus, our work still has room for improvement, including (1) taking into account more variables related to the geophysical mechanisms of precipitation in the multivariate regression model; (2) assessing the influence ranges of rain gauges and narrowing down the searching ranges in the automated bandwidth selection process, for more accurate rainfall estimation and efficient computing; (3) finding an efficient way of determining the threshold applied to the indicator field, e.g., by adapting its value for each day and ensuring that the proportion of rainy areas in the final estimates is the same as that in the original TRMM; and (4) making the framework flexible and computationally efficient enough to be run in real-time or near real-time, by automatically adapting the key parameters of the algorithms, recoding the algorithms in a single development environment that supports high performance computing, and applying a real-time remote sensing rainfall product, such as 3B42-RT, to the TMPA product in real-time.

Conclusions
This study explored a satellite and rain gauge data merging framework for relatively high spatiotemporal rainfall estimation under data scarce conditions, combining the techniques of statistical spatial downscaling, double kernel smoothing, shuffled complex evolution, and indicator kriging, so as to downscale satellite rainfall products, merge satellite and rain gauge data with

Conclusions
This study explored a satellite and rain gauge data merging framework for relatively high spatiotemporal rainfall estimation under data scarce conditions, combining the techniques of statistical spatial downscaling, double kernel smoothing, shuffled complex evolution, and indicator kriging, so as to downscale satellite rainfall products, merge satellite and rain gauge data with minimum cross-validation error, and consider the spatial intermittency of daily rainfall.The framework was applied to estimate daily precipitation at a 1 km resolution in the Qinghai Lake Basin.The results of this investigation showed that the proposed merging framework was able to estimate high spatiotemporal rainfall from the coarse Tropical Rainfall Measuring Mission (TRMM) data and sparse rain gauge observations with a small estimation error.Stream flow simulations based on the geomorphology-based hydrological model showed a better performance when forcing the model with the merging results than rainfall estimated merely from the original TRMM product or interpolated from the sparse rain gauge data.Our work sets up an example study of high spatiotemporal rainfall estimation that takes advantage of the strengths of both remote sensing and gauged rainfall to meet the challenges of sparse in situ data.The obtained results can be used in subsequent hydrological analysis and distributed hydrological modeling in the Qinghai Lake Basin.Accurate estimation of daily precipitation at fine spatial resolutions in real-time is crucial for distributed hydrological modeling and hazards forecasting.The accuracy, generality, flexibility, and computational efficiency of the framework are our future concerns and, thus, future studies should improve the framework for real-time running, and evaluate the performance of the merging framework by comparing it to other estimation schemes, and by applying it to other data-scarce areas.

Figure 1 .
Figure1.Map of the Qinghai Lake Basin, showing the regional topography, main river network, rain gauges, and discharge station at Buhahekou.

Figure 1 .
Figure1.Map of the Qinghai Lake Basin, showing the regional topography, main river network, rain gauges, and discharge station at Buhahekou.

Figure 2 .
Figure 2. Schematic overview of the rainfall estimation process.

Figure 2 .
Figure 2. Schematic overview of the rainfall estimation process.

Figure 3 .
Figure 3. Rainfall merging process for date 30 July 2008, with rainfall amount (mm) at gauges annotated: (a) The TRMM derived daily rainfall field at 25 km resolution; (b) the spatial downscaled TRMM rainfall field at 1 km resolution; (c) the merged rainfall field from the downscaled TRMM and gauged rainfall; and (d) the final indicator conditioned rainfall field.The gray color indicates non-rainfall areas, and the rainfall amount at weather stations are annotated.

Figure 3 .
Figure 3. Rainfall merging process for date 30 July 2008, with rainfall amount (mm) at gauges annotated: (a) The TRMM derived daily rainfall field at 25 km resolution; (b) the spatial downscaled TRMM rainfall field at 1 km resolution; (c) the merged rainfall field from the downscaled TRMM and gauged rainfall; and (d) the final indicator conditioned rainfall field.The gray color indicates non-rainfall areas, and the rainfall amount at weather stations are annotated.

Figure 4 .
Figure 4. Annual precipitation (mm) for 2008 and the PBIAS between merging results and rain gauge time series: (a) the original TRMM at 25 km resolution; (b) the spatial downscaled TRMM at 1 km resolution; (c) the DS Merged result; and (d) the final indicator conditioned estimates.

Figure 4 .
Figure 4. Annual precipitation (mm) for 2008 and the PBIAS between merging results and rain gauge time series: (a) the original TRMM at 25 km resolution; (b) the spatial downscaled TRMM at 1 km resolution; (c) the DS Merged result; and (d) the final indicator conditioned estimates.

Figure 5 .
Figure 5.Comparison of daily mean errors for (a) spatial downscaled TRMM; (b) DS merged rainfall; and (c) the final indicator conditioned rainfall

Figure 5 .
Figure 5.Comparison of daily mean errors for (a) spatial downscaled TRMM; (b) DS merged rainfall; and (c) the final indicator conditioned rainfall

Figure 6 .
Figure 6.Daily simulated and observed discharge at the Buhahekou hydrometric station.Figure 6. Daily simulated and observed discharge at the Buhahekou hydrometric station.

Figure 6 .
Figure 6.Daily simulated and observed discharge at the Buhahekou hydrometric station.Figure 6. Daily simulated and observed discharge at the Buhahekou hydrometric station.

Figure 7 .
Figure 7. Annual precipitation (mm) for 2008 from (a) the proposed merging framework; and (b) co-kriging combined with indicator kriging.

Figure 7 .
Figure 7. Annual precipitation (mm) for 2008 from (a) the proposed merging framework; and (b) co-kriging combined with indicator kriging.

Table 1 .
Data and parameter sets used in the hydrological simulation.

Table 2 .
Parameters of the main subprocesses for rainfall estimation on 30 July 2008.

Table 3 .
Evaluation statistics of the intermediate and final results for rainfall estimation on 30 July 2008.

Table 4 .
Evaluation statistics for rainfall estimation based on observations at the 13 stations, 2008.

Table 4 .
Evaluation statistics for rainfall estimation based on observations at the 13 stations, 2008.

Table 5 .
Evaluation statistics at the Tianjun station.

Table 5 .
Evaluation statistics at the Tianjun station.

Table 6 .
Performance scores of daily stream flow, 2008.