1. Introduction
Accurate estimation of the amount, and the spatial and temporal distribution, of precipitation is crucial for hydrological analysis and flood forecasting, particularly because distributed hydrological modeling has emerged as an effective method of analyzing hydrological processes, predicting the evolution of hydrological variables, and forecasting hydrological hazards [
1,
2]. A handful of studies [
3,
4,
5,
6] have demonstrated that precipitation levels are the main area of uncertainty in hydrological model predictions. Achieving reliable modeling outputs thus requires forcing hydrological models with accurate rainfall at relative fine spatial and temporal resolutions.
Mapping rainfall is always challenging. Interpolation of point rainfall measured by rain gauges is the conventional method (for example, with Thiessen polygons, inverse distance weighting, or kriging techniques), but this may be subject to great uncertainty when the rain gauge network is sparse, as the gauges can only represent rainfall information within a limited distance [
7]. In contrast, remote sensing techniques provide an evolutionary method of spatial continuous rainfall observation with a high temporal sampling frequency. However, remote sensing products may also generate major quantitative errors, due to cloud effects and limitations in remote sensor performance and retrieval algorithms [
8]. Combining both data sources, known as data merging may, thus, be effective in maintaining both high-quality rainfall data from stations and spatially-continuous information from remote sensing observations [
9]. Great efforts have been made to develop and evaluate algorithms for merging rain gauge and remote sensing observations, e.g., co-kriging [
10,
11], linearized weighting procedures [
8,
12], conditional merging [
13], Barnes objective analysis [
14], multi-quadric surface fitting [
9], and double kernel smoothing [
15]. The outcomes of these studies are encouraging, and provide new methods of estimating spatiotemporal precipitation, particularly in areas with limited climate data, such as the Qinghai-Tibetan Plateau.
However, it is quite a challenging job to generate high spatial and temporal rainfall maps for distributed hydrological modeling at a local basin scale when facing the following problems all at once: (1) only a sparse rain gauge network is available; (2) the spatial resolution of satellite data is much coarser than the modeled one [
7]; and (3) daily rainfall is spatially intermittent [
11]. This study thus aims to present a methodology that overcomes these issues.
The kriging-based merging scheme is a common and mature spatial prediction method, but requires the assumption of a second-order stationary and a theoretical semi-variogram model. In poorly-gauged areas kriging-derived methods may, thus, overestimate the spatial correlation, as distances between rain gauges are often too large and, hence, tend to deliver unsatisfactory results [
16]. Instead, nonparametric merging methods without strong spatial assumptions may be more suitable for sparse designs. Li and Shao [
15] used the nonparametric double kernel smoothing technique to combine TRMM precipitation data with observations from the Australian rain gauge network, focusing on discontinuity correction and spatial interpolation adapting for sparse design, and compared this to the geostatistical methods of ordinary kriging and co-kriging. Nerini et al. [
16] compared the nonparametric rainfall methods of double kernel smoothing and mean bias correction with two geostatistical methods—kriging with external drift and Bayesian combination—for merging daily TRMM precipitation with rain gauge data over a mesoscale tropical Andean catchment in Peru. Both studies concluded that the nonparametric double kernel smoothing merging method performed better than the more complex geostatistical methods under data-scarce conditions.
Nonetheless, the precipitation with finer spatial resolution necessary for hydrological models is still not present, as the merged results often retain the same scale as the satellite data. Spatial downscaling of the satellite observations is thus necessary before the merging, which is a technique for disaggregating coarse-resolution data and capturing the sub-grid heterogeneity. It is usually based on the concept of scale invariance, or relates the properties of the physical process at one scale to those at a finer scale [
17]. A common and simple downscaling method is to develop a statistical model at the original coarse scale, based on the relationships between rainfall and the main factors that govern the rainfall spatial variability, and then transfer the model to the target scale, such as the works of Jia et al. [
18], Fang et al. [
19], and Shi et al. [
20].
Spatial intermittency means daily rainfall is delivered in discrete patches in space and time, which causes a discontinuous surface with areas of zero rainfall between areas of non-zero rainfall [
11]. However, most studies have neglected this feature, except the works of Barancourt et al. [
21], Grimes and Pardo-Igúzquiza [
22], and Chappell et al. [
11], who tackled the discontinuity by thresholding the rainfall distribution with the indicator kriging to map the presence and absence of rainfall.
Hence, this study presents a framework for estimating precipitation with high spatial and temporal resolution for distributed hydrological modeling in the Qinghai Lake Basin, a data-scarce area in the northeast of the Qinghai-Tibet Plateau. Using the 0.25° TRMM and a sparse rain gauge network, statistical spatial downscaling, double kernel smoothing merging, and indicator kriging techniques are combined, following Fang et al. [
19], Li and Shao [
15], and Chappell et al. [
11], respectively, to solve the issues proposed previously.
4. Discussion
The statistical spatial downscaling scheme based on the relationships among precipitation, topographical features, and weather conditions successfully represented the spatial pattern of the precipitation fields in the original TRMM data, and did not cause higher estimation errors than the original TRMM. The downscaling approach was initially designed for extreme convective rainfall events, which can have different formation mechanisms. The topographical and meteorological factors only reflect some of the environmental effects on precipitation [
19]. In addition, there was very little precipitation from October to March (about 10% of the annual precipitation), the downscaling approach could not cause high estimation errors during that period. The estimation errors would be further reduced as the preliminary results were then calibrated by the subsequent merging process. Thus, although it cannot perform well every day in rainfall spatial downscaling, the statistical spatial downscaling scheme is still applicable in this study area.
When compared with the original and downscaled TRMM, the double kernel smoothing merged results reduced the estimation error in terms of
ME,
PBIAS,
RMSE, and
NSE, but tended to overestimate the spatial averaged rainfall amount, particularly for heavy rains. Li and Shao [
15] assigned a specified value to bandwidth
h1 and then automatically selected bandwidth
h2, which needs some prior knowledge and does not apply to all cases. This study employed the SCE global optimization algorithm to automatically estimate the two bandwidths for each rainy day and, thus, can achieve optimal calibration of the TRMM rainfall. For most days, however, the amount of precipitation was underestimated by the original TRMM, leading to negative point residuals. The searching space of SCE was from 25 km to the length of analysis window diagonal, and the estimated bandwidths might, therefore, be larger than the influence distance of the weather stations, which would exaggerate the underestimation area. This was why the spatial averaged precipitations were overestimated.
The final indicator conditioned estimates captured the spatial pattern of daily and annual precipitation with a relatively small rainfall estimation error, and also performed very well in the stream flow simulation when forcing the GBHM model. We are, thus, able to gain insights into the spatial distribution of precipitation at a fine resolution in the Qinghai Lake Basin for the first time. The annual precipitation in the northwestern part of the Qinghai Lake was observed to be significantly less than that of the central and southeastern areas (see
Figure 4), which could not be identified by the few existing weather stations. Previous studies of water balance analysis and hydrological simulation of the lake have to either calculate precipitation from the sparse weather stations [
23] or directly use gridded precipitation products at coarse resolutions [
44]. Our resulting high spatiotemporal rainfall dataset can, therefore, be used in subsequent hydrological analysis and distributed hydrological modeling in this area.
As there has not been any rainfall data merging research in the Qinghai Lake Basin before, we can only just compare the rainfall product estimated by our merging framework with the estimates obtained by other merging techniques. Co-kriging [
10], combined with indicator kriging with the same threshold, was used, by which the performance criteria obtained at the Tianjun station were −0.49 for
ME, −49.38% for
PBIAS, 2.14 for
RMSE, and 0.55 for
NSE. For stream flow simulation,
NSE,
R2,
PBIAS, and
RSR were 0.83, 0.85, −25.53%, and 0.41, respectively. These figures show larger cross-validation error and similar performance of stream flow simulation, compared with the results obtained by the merging framework proposed in this study. Annual precipitation shows a similar spatial pattern, but are roughly varying, with short and straight fringes of rainfall amount classes (see
Figure 7b). Thus, our merging framework is more adaptive than the kriging-based merging scheme for rainfall estimation in this data-limited area.
Leaving aside uncertainties about the satellite and weather station data, the merging framework contained three main sources of uncertainties in the final results: the environmental factors used in the downscaling process, the search space of the bandwidths and, particularly, the threshold determining the borders between rainfall and non-rainfall areas. On the other hand, real-time merging of rain gauge and remote sensing data has become a new perspective in hydrological forecasting [
2,
9,
45]. By far, however, the merging framework proposed in this study can be used only to back-analyze past rainfall events, which may strongly restrict its scope of application. Nonetheless, this framework has the potential to be applied in real-time, as it adapts the key parameters of the spatial downscaling and data merging algorithms for every time-step, instead of being constant for the whole period and setting at their optimal values. Thus, our work still has room for improvement, including (1) taking into account more variables related to the geophysical mechanisms of precipitation in the multivariate regression model; (2) assessing the influence ranges of rain gauges and narrowing down the searching ranges in the automated bandwidth selection process, for more accurate rainfall estimation and efficient computing; (3) finding an efficient way of determining the threshold applied to the indicator field, e.g., by adapting its value for each day and ensuring that the proportion of rainy areas in the final estimates is the same as that in the original TRMM; and (4) making the framework flexible and computationally efficient enough to be run in real-time or near real-time, by automatically adapting the key parameters of the algorithms, recoding the algorithms in a single development environment that supports high performance computing, and applying a real-time remote sensing rainfall product, such as 3B42-RT, to the TMPA product in real-time.
5. Conclusions
This study explored a satellite and rain gauge data merging framework for relatively high spatiotemporal rainfall estimation under data scarce conditions, combining the techniques of statistical spatial downscaling, double kernel smoothing, shuffled complex evolution, and indicator kriging, so as to downscale satellite rainfall products, merge satellite and rain gauge data with minimum cross-validation error, and consider the spatial intermittency of daily rainfall. The framework was applied to estimate daily precipitation at a 1 km resolution in the Qinghai Lake Basin. The results of this investigation showed that the proposed merging framework was able to estimate high spatiotemporal rainfall from the coarse Tropical Rainfall Measuring Mission (TRMM) data and sparse rain gauge observations with a small estimation error. Stream flow simulations based on the geomorphology-based hydrological model showed a better performance when forcing the model with the merging results than rainfall estimated merely from the original TRMM product or interpolated from the sparse rain gauge data. Our work sets up an example study of high spatiotemporal rainfall estimation that takes advantage of the strengths of both remote sensing and gauged rainfall to meet the challenges of sparse in situ data. The obtained results can be used in subsequent hydrological analysis and distributed hydrological modeling in the Qinghai Lake Basin. Accurate estimation of daily precipitation at fine spatial resolutions in real-time is crucial for distributed hydrological modeling and hazards forecasting. The accuracy, generality, flexibility, and computational efficiency of the framework are our future concerns and, thus, future studies should improve the framework for real-time running, and evaluate the performance of the merging framework by comparing it to other estimation schemes, and by applying it to other data-scarce areas.