Evaluation and Hydrological Application of a Data Fusing Method of Multi-Source Precipitation Products-A Case Study over Tuojiang River Basin

: Precipitation is an essential driving factor of hydrological models. Its temporal and spatial resolution and reliability directly affect the accuracy of hydrological modeling. Acquiring accurate areal precipitation needs substantial ground rainfall stations in space. In many basins, ground rainfall stations are sparse and uneven, so real-time satellite precipitation products (SPPs) have become an important supplement to ground-gauged precipitation (GGP). A multi-source precipitation fusion method suitable for the Soil and Water Assessment Tool (SWAT) model has been proposed in this paper. First, the multivariate inverse distance similarity method (MIDSM) was proposed to search for the optimal representative precipitation points of GGP and SPPs in sub-basins. Subsequently, the correlation-coefﬁcient-based weighted average method (CCBWA) was presented and applied to calculate the fused multi-source precipitation product (FMSPP), which combined GGP and multiple satellite precipitation products. The effectiveness of the FMSPP was proven over the Tuojiang River Basin. In the case study, three SPPs were chosen as the satellite precipitation sources, namely the Climate Forecast System Reanalysis (CFSR), Tropical Rainfall Measuring Mission Project (TRMM), and Precipitation Estimation from Remotely Sensed Information using Artiﬁcial Neural Network Climate Data Record (PERSIANN-CDR). The evaluation indicators illustrated that FMSPP could capture the occurrence of rainfall events very well, with a maximum Probability of Detection (POD) and Critical Success Index (CSI) of 0.92 and 0.83, respectively. Furthermore, its correlation with GGP, changing in the range of 0.84–0.96, was higher in most sub-basins on the monthly scale than the other three SPPs. These results demonstrated that the performance of FMSPP was the best compared with the original SPPs. Finally, FMSPP was applied in the SWAT model and was found to effectively drive the SWAT model in contrast with a single precipitation source. The FMSPP manifested the highest accuracy in hydrological modeling, with the Coefﬁcient of Determination ( R 2 ) of 0.84, Nash Sutcliff ( NS ) of 0.83, and Percent Bias ( PBIAS ) of only − 1.9%.


Introduction
Accurate runoff prediction is conducive to water management and planning, agricultural irrigation [1], climate and human activity impact study [2][3][4], and mitigating the major disasters and losses caused by floods and droughts. At present, the methods of runoff prediction [5][6][7] mainly include two categories: data-driven models and physical models. Data-driven models are comparatively simple to construct, but they cannot explicitly reveal the internal mechanisms of hydrological processes. Physical models, such as semi-distributed hydrological models TOPMODEL [8], variable infiltration capacity [9][10][11], (CPC MORPHing technique) satellite precipitation and in-situ precipitation performed better than the original CMORPH. Wu et al. [28] proposed a deep fusion method based on the combination of convolutional neural network and long-short-term memory network to merge TRMM and rain gauge data; the method reduced the errors and improved the accuracy of the original TRMM. The basic idea of these fusion algorithms is to establish the correlation between grid precipitation and other geographic variables or climate variables through a linear regression method or non-linear deep learning method to gain the modeled precipitation and then use GGP to correct its deviation and validate the accuracy. Several other studies have used the Bayesian-related method to calculate the respective optimal weights for multiple SPPs and evaluated the simulation capability of the merged precipitation via different hydrological models. For example, Ma et al. [29] developed a merged precipitation dataset combining multiple SPPs based on the Dynamic Bayesian Model Averaging scheme. They evaluated the blended multi-satellite precipitation data in the headwaters of the Yangtze River using the Coupled Routing and Excess Storage model and stated that the merged precipitation showed a promising prospect of hydrological application. Ur Rahman et al. [30] compared the hydrological performance using the SWAT model forced by in-situ precipitation and two merged precipitation products, respectively. The two merged precipitation products produced by multiple SPPs were based on the Dynamic Clustered Bayesian Averaging and Dynamic Bayesian Model Averaging method, respectively, and obtained streamflow forecasting results close to in-situ precipitation.
However, the fusion precipitation on the basis of these methods undergoes multiple processes such as downscaling, resampling, regression, and interpolation, potentially influencing the accuracy of precipitation [25]. Few works investigate fusion precipitation by considering the characteristics of the hydrological model to simplify the fusion processes and hence improve the hydrological applicability of the fused precipitation. This paper proposed a two-step fusion scheme to obtain the fused multi-source precipitation product (FMSPP). First, the multivariate inverse distance similarity method (MIDSM) has been proposed and applied to search for the optimal representative precipitation points of GGP and SPPs. Second, the correlation-coefficient-based weighted average (CCBWA) method was presented and employed to construct the FMSPP based on multiple SPPs and GGP. In this study, the satellite precipitation sources included CFSR, TRMM, and PERSIANN-CDR. The effectiveness of the fusion method was proven through two aspects. First, the accuracy of FMSPP was evaluated by comparison with the distribution of original SPPs on different temporal and spatial scales. Secondly, FMSPP, GGP, and every SPP were used to drive the SWAT model. The corresponding simulated streamflow was compared with the observations to evaluate their applicability in the hydrological model. The rest of this paper is organized as follows. Section 2 describes the study basin and data preparation. Section 3 presents the methodology. The results and discussion are given in Sections 4 and 5, respectively. Finally, Section 6 summarizes the study.

Study Basin
The Tuojiang River is a major tributary of the upper reaches of the Yangtze River, with a length of 502 km and a total basin area of 27,860 km 2 , spanning between 103 • 38 and 105 • 50 East and 27 • 50 to 31 • 41 North. Figure 1 shows the geographic locations of the Tuojiang River Basin (TJRB). Twenty-four ground meteorological stations were distributed within or near the TJRB (Table 1 and Figure 1a). The TJRB is located in the subtropical humid monsoon climate zone, with the characteristics of a mild climate, abundant rainfall, and four distinct seasons. It traverses the mountainous areas on the western edge of Sichuan, Chengdu Plain, and hilly regions of the central Sichuan Basin. The climate between the north and the south differs due to the complex terrain and the elevational difference. The temperature gradually rises from the northwest to the southeast. The annual mean temperature in mountainous areas is 15.7 • C and in hilly areas its 17.6 • C. Rainfall is heavier in the mountains compared to the hills. For example, the annual mean precipitation is 1200-1500 mm in the Lutou Mountain rainstorm area, 1000-1400 mm in the Chengdu Plain, and only 900 mm in the hilly area. Precipitation is more concentrated in the summer, accounting for about 60% of the total annual precipitation, while winter only accounts for 4%.

Precipitation Data and Other Data
In this study, the in-situ precipitation data measured by the 24 ground meteorological stations were acquired from the China Meteorological Data Service Center. Other meteorological data required for the SWAT model, such as temperature, humidity, and wind speed, were also collected from ground measurements. Three types of SPPs with daily scale and spatial resolution of 0.25 • were selected, namely, CFSR, TRMM-3B42, and PERSIANN-CDR. The brief information about the aforementioned four precipitation products is shown in Table 2. Figure 2 displays the spatial distribution of different precipitation products. For a more detailed introduction to SPPs, please refer to the literature [21,31]. The monthly streamflow spanning from 1980-2008 measured at the three hydrological stations, namely Sanhuangmiao, Dengyinyang, and Lijiawan, was collected. These three hydrological stations, shown in Table 3 and Figure 1a, are located at the upper, middle, and downstream areas of the TJRB, respectively.
The Harmonized World Soil Database provided by the Food and Agriculture Organization of the United Nations is used for the soil data and reclassified into 27 types with a spatial resolution of 1 km (Figure 1d). The land use (Figure 1c) is obtained from the Resource and Environmental Science and Data Center (RESDC) and divided into six categories with a spatial resolution of 1 km, of which agricultural land accounts for 80.75%, and forest land accounts for 12.4%. DEM comes from Shuttle Radar Topography Mission, with a spatial resolution of 90 m (Figure 1b and Table 4). The elevation of the basin is approximately 4800 m in the northwest while as low as 250 m in the southeast. As for the catchment shape, the upper part is narrow, and the lower part is wide. The Tuojiang River flows into the Yangtze River in southern Luzhou City. The standardized difference vegetation index (NDVI) from 1998 to 2008 provided by the RESDC is adopted for MIDSM, with a spatial resolution of 1 km.

SWAT Model
SWAT is a physically-based semi-distributed hydrological model developed by the Agricultural Research Service of the United States Department of Agriculture [32]. The smallest calculation unit is the HRU, which is grouped by the fields of soil, land use, and slope together to simplify a run. About 1-10 HRUs gather into a sub-basin, and the flow generated and converged in each sub-basin flows into the connected river. A sub-basin is used to represent the precipitation of all internal homogeneous HRUs and distinguish the precipitation source. SWAT simulates the water cycle process according to the water balance equations: where SW 0 and SW t are the initial and final water contents in the soil, respectively; PREC represents the precipitation; SURQ denotes the surface runoff; ET is the evapotranspiration; WSEEP means the amount of water that percolates or bypasses from the bottom of the soil profile; GWQ and LATQ are groundwater flow and lateral flow generated from each HRU, respectively; TLOSS is the amount of water lost from the reach through bed transmission; WYLD represents the net water contributed by the HRU to the reach. In the SWAT model, climate data is read in the format of point records. The nearest station is selected as the precipitation source of the sub-basin based on the principle of the smallest distance from the geometric center of each sub-basin to the rainfall point. Each sub-basin can only be represented by the nearest precipitation point to characterize the spatial distribution of the entire sub-basin. The area of sub-basins should not be too small; otherwise, the hydrological calculations are immense.

The Multi-Source Precipitation Fusing Method
A two-step fusion method has been proposed. First, the multivariate inverse distance similarity method (MIDSM) is proposed to search for the optimal representative precipitation points of the GGP and SPPs. Second, the correlation-coefficient-based weighted average (CCBWA) method is presented and applied to form the fused multi-source precipitation product (FMSPP). FMSPP is the weighted average of the GGP and multiple satellite precipitation products. Multiple SPPs and the GGP compose an open database, and the SPPs involved in the database are not restricted to any particular type. All reliable satellite precipitation products developed in the future can participate, and unreliable satellite data can be eliminated. In the case study of this paper, three mainstream SPPs are chosen as the satellite precipitation sources, namely CFSR, TRMM, and PERSIANN-CDR. Unlike previous satellite-gauge fusion methods to establish a new grid precipitation field, the proposed method focuses on the real precipitation point chosen by the SWAT model. Moreover, the method aims to establish the sub-basin precipitation field and create assumed weather stations located at the geometric centers of sub-basins, facilitating SWAT to select these weather stations as the precipitation source automatically. Figure 3 illustrates the two-step fusing process of FMSPP. The fused precipitation is mathematically expressed as: where P j fuse is the fused precipitation for the sub-basin j; P i τ j is the precipitation at the representative precipitation point for product i τ , in sub-basin j; W i τ j is the weight of the precipitation product i τ , in sub-basin j. Note that this is a dynamic fusing process. W i τ j is not a fixed value, and it is varying in the sub-basins. Furthermore, the database can be updated, and FMSPP dynamically fuses all available in-situ and satellite datasets over the time span ( Figure 4).

MIDSM
How to choose the representative precipitation point for the sub-basin? Multivariate inverse distance similarity method (MIDSM) is proposed. MIDSM is based on inverse distance weighting (IDW). IDW [33], also known as the reciprocal distance multiplication method, is a geographic interpolation method based on the principle of similarity. Each data point has a certain influence on the interpolation point. The smaller the distance between the estimated and measured points, the greater the weight, and vice versa.
Precipitation is not only influenced by spatial location but also significantly correlates with elevation and NDVI [34]. Vegetation coverage in the continental area of China reflects the distribution of annual precipitation. The positive relationship between precipitation and NDVI was indicated in the Refs. [35,36]. A quantitative analysis about how NDVI responds to precipitation was made in Yellow-Huai-Hai River Basin [37], and it was concluded that 10% of increased precipitation would obtain 3.35-4.80% increase in NDVI. These findings indicate that increased precipitation is critical for the growth of most of the vegetation types. Herein, we take these four variables into consideration and propose the MIDSM to optimize selecting precipitation points for sub-basins. The precipitation point with the smallest generalized distance or the most significant similarity between the precipitation data points and the geometric center is selected as the precipitation source for the sub-basin. Figure 5 shows a schematic of the MIDSM, and the specific steps to apply this method are as follows. (1) All grid centers of SPPs are extracted as the rainfall points ( Figure 2). SPPs and GGP are distributed within or around each sub-basin, which significantly improves the alternatives of rainfall points.
(2) The watershed is delineated into 116 sub-basins by the SWAT model. Attributes including longitude, latitude, elevation, and NDVI are assigned to the precipitation data points and the geometric center of sub-basins. The generalized distance and the similarity between the geometric center point and the rainfall point can be calculated with Equations (4) and (5): . . , i m ) represents the precipitation products (GGP, CFSR, TRMM, and PERSIANN-CDR for the case study in this paper); j is the sub-basin number, and j ∈ [1,116]; lon, lat, ele, and NDVI are the longitude, latitude, elevation, and NDVI, respectively, and the four variables have been normalized. L i τ kj and S i τ kj denote the generalized distance and the similarity between the k (= 1,2,...,n) rainfall point of precipitation product i τ and the center point of sub-basin j, n is the number of grid centers of SPPs or observed stations of GGP in the search area.
The rainfall data point with the most significant similarity (the smallest generalized distance) is selected as the optimal representative point for each precipitation product, its precipitation is P i τ j .

CCBWA
How to calculate the W i τ j for product i τ ? The correlation coefficient-based weighted average (CCBWA) method has been suggested. CCBWA integrates multiple rainfall points screened by MIDSM into FMSPP and establishes the corresponding assumed station point located in the geometric center. Because the monthly-scale precipitation is more consistent and less biased than the daily-scale precipitation [38,39], the monthly scale CC has been chosen to calculate the weighting factors. The weighting factor is mathematically expressed as: where CC i τ j is the correlation coefficient of precipitation product i τ , in sub-basin j, against GGP; m is the number of precipitation products, here m = 4.

Temporal Evaluation of SPPs and FMSPP
Seven statistical indicators [40] were employed to quantitatively compare the performance of the original SPPs and FMSPP relative to GGP on the daily and monthly scales over the entire watershed. The indicators were the correlation coefficient (CC), root mean squared error (RMSE), mean error (ME), relative bias (BIAS), probability of detection (POD), false alarm ratio (FAR), and critical success index (CSI). The CC described the potential linear correlation between SPP and GGP, ranging from −1 to 1. The closer the CC value to 1, the more positively the SPP correlated with GGP. The RMSE, ME, and BIAS were introduced to describe the difference in the precipitation amount between the two rainfall products. Smaller absolute values of these three indexes meant minor discrepancies between SPP and GGP. The POD, FAR, and CSI were detectors for predicting rainfall occurrence on a daily scale. Higher POD and CSI and lower FAR corresponded to a higher ability to predict rainfall events.
As shown in Table 5 and Figure 6, the daily precipitation was mainly below 40 mm. CFSR had the highest CC of 0.72, the minimum RMSE of 4.44 mm, and the maximum POD of 0.92. However, its ME and BIAS were the largest, and both indexes were positive, revealing that precipitation was overestimated. The CC of TRMM and PERSIANN-CDR was 0.37 and 0.36, respectively; meanwhile, the RMSE of TRMM and PERSIANN-CDR was 6.43 and 6.41, respectively. Both the ME and BIAS for these two precipitation products were negative, indicating the underestimation of rainfall. The POD and CSI of the three original SPPs were relatively high, distributed in the range of 0.58-0.92 and 0.77-0.85, respectively, and the FAR was as low as 0.23. The values of CC and RMSE for FMSPP fell within the range obtained from the original SPPs. What is more, its POD, FAR, and CSI were the same as CFSR, which was the best. Additionally, the absolute values of ME and BIAS for FMSPP were the smallest (close to 0). Overall, FMSPP performed relatively well compared with the original SPPs on a daily scale.  On the monthly scale (Table 6 and Figure 7), the CC for FMSPP was the highest, and its RMSE was the smallest. Besides, the absolute values of ME and BIAS for FMSPP were close to 0. These results demonstrated that the performance of FMSPP was the best. CFSR presented the largest BIAS and ME. The CC for TRMM reached 0.97, followed by 0.95 of PERSIANN-CDR, and 0.86 of CFSR. Compared to FMSPP, the RMSE for TRMM and PERSIANN-CDR increased slightly, and that for CFSR almost tripled. FMSPP showed analogous rainfall probability density distributions to TRMM and PERSIANN-CDR. By comparison, CFSR allocated more rainfall in the scope of 100-150 mm. Moreover, The ME and BIAS of CFSR were positive, and those for TRMM, PERSIANN-CDR, and FMSPP were negative, illustrating that CFSR overestimated rainfall while TRMM and PERSIANN-CDR underestimated rainfall. This result was consistent with the conclusion on the daily scale.   Most of the annual precipitation was allocated during the wet season from June to September. In contrast, the dry season from November to February accounted for a mere portion of the annual rainfall. The distribution of FMSPP, TRMM, and PERSIANN-CDR was consistent with that of the GGP. Outliers of these four precipitation products were scattered on the maximum side in July and scattered on the minimum side in August. CFSR had no outliers and significantly overestimated the rainfall in these two months. TRMM slightly underestimated the rainfall during the dry season. Figures 9 and 10 show the spatial variation of the CC for the original SPP and FMSPP on daily and monthly scales. In terms of the daily-scale CC, CFSR was around 0.5, except for a small amount of terrain in the northern mountainous area with the CC of 0.3. The correlation coefficients for PERSIANN-CDR over most subbasins were around 0.3 except for a minor portion of northern and midwestern regions. That for TRMM varied distinctly from 0.3 in the north to 0.2 in the south. The CC for FMSPP improved remarkably compared with TRMM and PERSIANN-CDR, with most of the area within the range of 0.3-0.6. On the monthly scale, the CC for each precipitation product was improved significantly in all sub-basins. FMSPP had the highest correlation in contrast with the other three SPPs. Its CC changed in the range of 0.84-0.96. TRMM and PERSIANN-CDR were slightly inferior to FMSPP. Their CCs in most areas ranged from 0.80 to 0.9. That for CFSR was below 0.8. Regarding the variation of average monthly and annual rainfall (Figures 11 and 12), the four precipitation products all captured the spatial features of less water in the north than in the south. The monthly and annual rainfall of FMSPP in most subbasins was below 100 and 1200 mm, respectively. This result was close to GGP. In addition, the three precipitation products other than CFSR showed a gradually increasing trend from north to south. CFSR was very different in the northern and midwestern regions-precipitation was slightly underestimated in the north and significantly overestimated in the midwest. Specifically, the monthly precipitation and annual precipitation of GGP in the north were about 65 mm and 800 mm, respectively, and those of the CFSR were about 50 mm and 700 mm, respectively. In the central and western regions, the monthly precipitation and annual precipitation of GGP were 95 mm and 1100 mm, and those of the CFSR were 200 mm and 2500 mm, respectively. Note that these two areas were the transitional zones from the mountainous area to the plain. The rainfall was more significantly affected by topographical factors in these two areas than in the southern plain [41]. The notable deviation of CFSR from GGP illustrated that the CFSR was prone to more considerable error in complex terrain than the TRMM and PERSIANN-CDR in this basin.

Evaluation of the Hydrological Performance of Different Precipitation Products
This part mainly evaluated the hydrological applicability of FMSPP and compared its performance with that of other precipitation products. All precipitation products used the same set of parameters when driving the SWAT model to reduce the propagation of uncertainty caused by the model structure. In this paper, we adopted high-precision GGP to drive the hydrological model; 1980-1992 and 1993-1999 were used as the calibration period and validation period, respectively. The simulated monthly streamflow was calibrated and validated by the measurements at the three hydrological stations. After that, 2000-2008 was selected as the evaluation period to evaluate the accuracy of each precipitation product in streamflow prediction. The warm-up period was 1998-1999. The coefficient of determination (R 2 ), Nash Sutcliff (NS), and Percent Bias (PBIAS) were used to evaluate the hydrological performance.
The optimal value of R 2 and NS is 1. The better simulations receive higher values of R 2 and NS. PBIAS is used to characterize whether the mean magnitude of the modeled streamflow is higher or lower than the measured one. If it is less than 0, the streamflow is overestimated, and otherwise, underestimated.

Calibration and Validation
Based on the Sequential Uncertainty Fitting (SUFI) algorithm embedded in the SWAT Calibration Uncertainty Programs (SWAT-CUP), the Latin hypercube method [42] was utilized to sample each parameter value within a specific range, and NS was used as the objective function to find the optimal parameter set. Table 7 shows the range of each parameter [43], the optimal parameter set, and the global sensitivity analysis results. The t-value was employed to identify the significance of the parameter, and the p-value determined whether to reject the null hypothesis (a rejection of the hypothesis meant that the parameter had a significant impact on the objective function value). The global sensitivity analysis was relative to one-at-a-time sensitivity analysis, and it signified the alteration of the objective function value resulting from a parameter change. The more the objective function value changed, the more sensitive the objective function was to the parameter, corresponding to a higher t-value and a lower p-value. As shown in Figure 13 and Table 7, the most sensitive parameter was the initial SCS runoff curve for moisture condition II (CN2), then base flow alpha-factor (ALPHA_BNK), followed by manning's n value for the main channel (CH_N2). For the meaning of other parameters involved in the figure and table, please refer to [44]. Notes: V means that the current parameter value is to be replaced by a given value; R represents that the current parameter value is to be multiplied by (1 + a given value).
The simulated runoff during the calibration period matched well with the runoff measured at the three hydrological stations (Table 8 and Figure 14). The evaluation indexes NS, R 2 , and PBIAS ranged from 0.73 to 0.93, 0.84 to 0.93, and −2.7% to 12.6%, respectively. In the validation period of 1993 to 1999, the simulation performance decreased marginally; NS and R 2 reduced slightly (0.1-0.2), and PBIAS almost doubled. Overall, the performance could be evaluated as "good" according to the SWAT performance rating criteria proposed in the literature [38].

Performance Comparison of SWAT Model Forced by Different Precipitation Products
During the evaluation period, the monthly streamflow measured and simulated at the Lijiawan hydrological station was used to compare the performance of different precipitation products in driving the SWAT model for monthly runoff prediction. Note that the streamflow measurements from 2005 to 2006 were not collected. The order of months labeled on the x-axis in Figure 15 was done in a consecutive way. Judging from the evaluation indicators in Table 9 and Figure 15, one could find that the PBIAS of the three precipitation products other than TRMM was smaller than 0, suggesting that the simulated streamflow was larger than the measured values. Specifically, the simulated discharge of GGP performed well with the indexes R 2 of 0.8 and NS of 0.78. R 2 . The R 2 and NS for TRMM were very close to those of GGP, with values of 0.8 and 0.70, respectively, but its PBIAS was positive (10.3%), indicating that the modeled streamflow for TRMM was smaller than the measurements. The CFSR underestimated the streamflow in July 2002 and 2003 and notably overestimated the streamflow in July after 2004. Besides, CFSR had the smallest R 2 and NS, which were 0.60 and 0.40, respectively. In conclusion, the simulation result of CFSR was not as good as the other four precipitation products. The R 2 and NS for FMSPP were 0.84 and 0.83, respectively, close to those for GGP and TRMM (identified to perform well). Moreover, its PBIAS was the smallest among all the precipitation products, at only −1.9%, demonstrating its best performance. Therefore, FMSPP could drive the SWAT model successfully and improve monthly runoff prediction.

Discussion
The rainfall deviations of the three original SPPs and FMSPP from GGP are depicted in Figures 11 and 12. The spatial distribution of TRMM, PERSIANN-CDR, and FMSPP is approximate to GGP over the sub-basins, whereas CFSR has the largest deviation from GGP. Besides, SPPs and FMSPP show a trend of underestimation in the south compared with GGP, while the spatial feature in the north is more complicated because of partial underestimation and overestimation. The western edge and midwest are mountainous areas and transitional zones, over where the rainfall bias is higher than that of the southern plain, particularly for CFSR. The CC values show the opposite pattern (Figures 9 and 10). This result is consistent with the previous studies that satellite-based precipitation is more reliable in plain than in complex mountainous regions [21,45,46]. Note that, in addition to the reduced performance of SPPs in mountainous regions, the nature of the precipitation is also important. The study area is dominated by monsoonal rainfall, which tends to occur in large-scale systems that can be detected easily by relatively coarse-resolution satellite data. For basins in which, for example, convective rainfall is dominant, the performance of satellite-based precipitation measurement is likely to be worse.
In our study basin, we have a dense network of in-situ precipitation stations, and GGP naturally becomes the main accurate source of FMSPP. Just because of this, the analysis result of satellite precipitation products is hence more reliable. As shown in the spatial-temporal analysis and the hydrological modeling results, the performance of TRMM and PERSIANN-CDR is satisfactory. These good SPPs contribute some accuracy to the FMSPP; that is why FMSPP performs better than GGP. The results obtained in our study basin can provide a good reference to the data-sparse basin. If there are no in-situ stations, or the in-situ stations are too sparse, making the in-situ precipitation an unreliable source for the data-sparse basin, one can take the 'most accurate satellite precipitation' as the benchmark. This benchmark can be determined by referring to similar basins with rich in-situ stations.
In that scenario, all the other possible precipitation sources, including the sparse in-situ precipitation, should be compared with the selected precipitation source. Therefore, the fusion method should be valid for the data-sparse region and can effectively reduce the uncertainty induced by multiple precipitation sources.
As mentioned in the literature [21], the hydrological performance of different SPPs in a particular basin may not apply to other basins with different characteristics. Different satellite products may perform distinctively in diverse regions. It is improper to judge which precipitation product is absolutely superior to the others based on the performance of the precipitation in a specific basin [47,48]. MIDSM-CCBWA proposed in this paper can dynamically integrate multi-source gauge-satellite precipitation at different sub-basins. This methodology effectively brings down the bias induced by the random application of a single satellite precipitation source to different regions and also maintains the characteristics of high-precision in-situ precipitation. Hence, the FMSPP has a more general hydrologic applicability in the data-sparse region. Figure 16 shows the probability density distribution of the monthly precipitation of the five precipitation products. Besides, the correlations and their confidence intervals between any two precipitation products are displayed in the figure. CFSR is less correlated with the other four precipitation products. Its Pearson CC with GGP, TRMM, PERSIANN-CDR, and FMSPP is 0.86, 0.84, 0.85, and 0.89, respectively. The CC for FMSPP against GGP reaches 0.97, whereas that between FMSPP and CFSR is 0.89, which reveals that the fused rainfall product can well preserve the characteristics of GGP and its correlation with CFSR (identified to perform worse) is comparatively weak.
The magnitude and spatial distribution of precipitation directly influence the accuracy of runoff prediction. Figure 17 exhibits the correlations and confidence intervals between the measured streamflow and the simulations obtained from the rainfall products. The probability density distribution of the measured streamflow and that of the simulated streamflow are also shown in this figure. The simulated runoff significantly correlates with the measured runoff. The Pearson CC is 0.92 for FMSPP, 0.90 for both GGP and TRMM, and 0.82 for PERSIANN. The simulation performance of CFSR is relatively poor, but its CC value still reaches 0.77. Overall, one needs to be cautious if CFSR is selected as the single rainfall source to simulate monthly runoff in this basin. The FMSPP presents the highest CC with the measured runoff.
The GGP is used to calibrate the model on the premise that the model parameters are stationary. During the evaluation period, the hydrological performance forced by each precipitation product is assessed by comparing the simulated monthly runoff with the measurements. According to the water balance equation, the amount of river flow depends on the water yield contributed by each sub-basin (or HRU). Water yield is mainly composed of three parts: surface flow, lateral flow, and groundwater flow. Figure 18 shows the CC of the different components of the water balance in the SWAT model simulated by each rainfall product. It can be deduced that the precipitation is significantly correlated with water yield, and the CC between precipitation and water yield for all the five rainfall products, i.e., GGP, CFSR, TRMM, PERSIANN-CDR, and FMSPP, is 0.93, 0.88, 0.91, 0.89, and 0.84, respectively.   However, various water balance components may contribute differently to water yield in different precipitation products; this may affect the simulated runoff. For example, surface runoff and lateral runoff are the main constituents of water yield for all the precipitation products, and their correlation with water yield can reach about 0.9. Groundwater flow does not contribute much to runoff generation in GGP, CFSR, and TRMM, while its correlation with water yield obtained in PERSIANN-CDR and FMSPP comes up to 0.66 and 0.76, respectively. The different contributions of groundwater flow to water yield modeled by different precipitation products are possibly due to the antecedent soil condition influenced by extreme flood events [49].

Conclusions
We propose the MIDSM-CCBWA fusion method to gain a more accurate areal rainfall FMSPP. The FMSPP was generated from an open database composed of multiple SPPs and GGP. In the case study based on the TJRB, the satellite precipitation sources involved in the database included CFSR, TRMM, and PERSIANN-CDR. Evaluation of each SPP and FMSPP at different spatio-temporal scales was performed over the TJRB. Then, the simulated streamflow forced by FMSPP, GGP, and each SPP was compared with the observations to evaluate their applicability to the hydrological modeling. The correlations and their confidence intervals between different precipitation products were further discussed, as well as the correlation of the water balance components in these precipitation products. This method can enhance the applicability of satellite precipitation products in hydrological models and improve the accuracy of hydrological forecasts by reducing the deviation caused by the uncertainty of precipitation sources. The main conclusions are as follows: 1.
FMSPP shows the maximum POD and minimum CSI, which proves that FMSPP can capture the occurrence of rainfall events very well. What is more, the absolute values of ME and BIAS for FMSPP are the smallest both on the daily and monthly scales over the watershed. Besides, the CC is significantly higher in most sub-basins on the monthly scale for FMSPP than the other three SPPs. Its CC changes in the range of 0.84-0.96. These results demonstrate that the performance of FMSPP is the best compared with the original SPPs.

2.
Among the precipitation products, FMSPP shows the best simulation results, with R 2 and NS both being the largest, which are 0.83 and 0.84, respectively. Moreover, its PBIAS is the smallest, at only −1.9%. The hydrological performance of GGP and TRMM is good, followed by PERSIANN-CDR, whereas CFSR is unsatisfactory. 3.
The proposed MIDSM-CCBWA fusion method dynamically integrates multi-source gauge-satellite precipitation over different sub-basins and forms the FMSPP, which can effectively reduce the bias induced by the random application of a single precipitation source and improve the general applicability for streamflow simulation in the datasparse region. 4.
FMSPP can preserve the characteristics of the precipitation source identified to perform well (e.g., GGP in the case study). It only has a relatively slight correlation with the precipitation source identified to perform worse (e.g., CFSR in this study). 5.
The rainfall deviation of SPPs from GGP over the mountainous areas on the northwest is higher than that of the southern plain, and the CC shows the opposite pattern in these areas. Thus, the satellite-based precipitation is generally more reliable in plain than in mountainous terrain for the study basin. Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.