4.1. Rainfall Characteristics of the TP
Figure 2 shows the spatial distributions of three-year daily mean precipitation for CGDPA and four GPM-era precipitation products over the TP. Generally, all precipitation sets shared similar spatial distribution: the mean precipitation decreased from the southeast to the northwest. One can see that a large amount of precipitation (more than 5 mm/day) concentrates over the southern region of the TP. The reason for this is the effect of the Himalayas, which intercept the moisture from the Indian Ocean monsoon and induce rainfall. In contrast, less precipitation is observed in most of the west and north, where the Westerlies do not prevail and the Indian monsoon is relatively weaker [
68]. Compared with the CGDPA, a pronounced difference can be found between the IMERG and GSMaP precipitation products. IMERG-UC significantly underestimated the TP’s precipitation and the GSMaP-MVK clearly overestimated the precipitation. However, the precipitation estimates of both gauge-adjusted products were more consistent with the ground measurements than the satellite-only products. After gauge correction, the under- and overestimations of satellite-only products were mitigated, especially in the eastern TP with more rain gauges. Therefore, the strategy of gauge adjustments using in situ measurements greatly improved the product accuracy.
Next, we displayed the rain rate distribution of daily precipitation amount and the daily number of precipitation events over the TP. As shown in
Figure 3, for the intensity distribution from the CGDPA, the distribution of the precipitating amount over the TP presents a single-peak pattern (approximately at a rain rate of 14 mm/d), and the most precipitating days occurred in the precipitation intensity across the range of 2–8 mm·day
−1. The satellite precipitation products exhibit similar distribution patterns with CGDPA in terms of precipitation rates for both the occurrence and the volume of precipitation, but there are some differences in the intensity distribution curves. For example, at the light rain range (0.25–2 mm·day
−1), all the satellite datasets detected more precipitation events than the reference datasets (
Figure 3b). Consequently, they contributed to the precipitation volumes, with the satellite-based estimates having more precipitation than the ground measurements at the light rain range.
4.2. Statistical Performance of Satellite Precipitation Estimates
In this section, we evaluated the satellite precipitation products against gridded gauge-based precipitation products over the period from April 2014 to March 2017. In order to ensure a more accurate comparison, only grid pixels with at least one gauge (132 grids) were taken to calculate the statistical metrics.
Figure 4 shows the scatterplots of daily IMERG-UC, IMERG-C, GSMaP-MVK, and GSMaP-Gauge data versus CGDPA for the selected grids. The evaluation metrics are also given in the figure. For the contingency table statistics (i.e., POD, FAR, FB, and ETS), a common threshold of 1.0 mm·day
−1 was used to determine the rain/no rain event, as suggested by many previous studies [
7,
69,
70,
71]. Generally, among the four satellite precipitation products, GSMaP-Gauge exhibited the best performance with the highest CC of 0.77, while GSMaP-MVK had the worst performance with a poor CC of 0.52. The IMERG-UC and IMERG-C products showed middle performance, with CC values of 0.67 and 0.70, respectively. For the satellite-only products, we can see that IMERG-UC significantly underestimated the precipitation by about −39.32%, and GSMaP-MVK overestimated the precipitation with an RB value of 26.11%. After the gauge calibration, the RB had a downward trend and the scatter points were clustered more closely to the 1:1 line than those of satellite-only estimates. Consequently, both gauge-adjusted products only showed slight underestimation relative to reference observations (
Figure 4b,d). In terms of the contingency table statistics, when compared to satellite-only products, the gauge-adjusted products also had better performance (with higher POD and ETS values). The gauge-adjusted products had more detected events compared to satellite-only products. However, it is worth noting that the FAR increased from IMERG-UC to IMERG-C. We argue that the calibration scheme of IMERG resulted in an increase in the number of false events over the TP. Thus, it contributed to the observed increases in FAR values. Correspondingly, the RMSE and RRMSE of IMERG-C did not improve over that of IMERG-UC.
To investigate the spatial distributions of error metrics, CC, RMSE, and RB were computed from the four satellite precipitation products, as shown in
Figure 5. In general, the CC values of all products were good over most regions of TP. Spatially, higher CC values are observed in the east TP compared to the west. This pattern of CC is attributed to the limitations of retrieval of satellite precipitation in mountainous and high-elevation regions [
19,
66]. We also note that the GSMaP-Gauge showed best correspondence with gauge measurements, with larger correlation and smaller error (
Figure 5j–l), which is consistent with the above statistical results. Interestingly, the spatial distributions of error metrics could explain the phenomenon of the RMSE of IMERG-UC being lower than that of GSMaP-MVK, but the bias performance of IMERG-UC was worse than that of GSMaP-MVK. The reason is that the positive and negative biases could cancel each other out. As shown in
Figure 5c, IMERG-UC underestimated reference precipitation over almost all of TP, while GSMaP-MVK showed overestimation in the east and underestimation in the west (
Figure 5i). Thus, for GSMaP-MVK, the magnitude of the total RB was reduced. However, focusing on the spatial distribution of RMSE, we can see that GSMaP-MVK still demonstrated the largest error among the four satellite precipitation products.
Figure 6 shows the temporal variations of averaged spatial precipitation and statistics for the selected grid boxes.
Table 3 lists the statistical summary of seasonal comparisons including spring (March–May), summer (June–August), autumn (September–November), and winter (December–February) by computing at the daily scale. Overall, the patterns of monthly mean precipitation for all satellite precipitation products exhibited similar fluctuations with gauge observations. Precipitation in the summer is the main water source over the TP, while precipitation in the winter only contributes a minor part of annual precipitation. The performance of satellite precipitation productions showed distinct seasonal variations. The statistical indices performed better in the summer than the other three seasons with high correlation, low relative error, and better detection for rain events (see
Table 3). For instance, the CC value of the four satellite precipitations ranged from 0.59 to 0.76 during the summer, while a lower CC occurred in the winter. We also note that the RMSE was larger during the summer months than that of the winter season (
Table 3 and
Figure 6c). This is because RMSE could be affected by large precipitation bias. The RB results indicate that GSMaP-MVK overestimated the reference precipitation, except during the winter, and IMERG-UC, IMERG-C, and GSMaP-Gauge underestimated the gauge observations in all seasons. Similar to the results of Duan et al. [
50] and Ning et al. [
72], satellite precipitation products showed higher error and poor capability of rainfall detection in the winter months. During the winter season, although the DPR improved the skill of snowfall observations, satellite precipitation products still showed unsatisfactory performance over the TP. This can be attributed to the limitation of passive microwave retrievals and IR information at cold or snow-covered background surfaces [
22], suggesting that the current GPM-era estimates still have room for improvement in the winter.
For analyzing the error characteristics of different precipitation event, following the error decomposition approach proposed by Tian et al. [
69], the total bias can be decomposed into different parts: hit bias, bias due to rainfall misses, bias due to false detections, and bias with selected threshold. As shown in
Figure 7, hit bias and total bias share considerable similarities in their spatial distributions, suggesting that hit bias is the dominant component of total bias. Considering that the negative bias with missed precipitation and positive bias with false precipitation have opposite signs, they can offset each other, resulting in a smaller total bias of satellite precipitation. The lighter precipitation (<1 mm/day) that we considered unreliable for either gauge data or satellite measurements contributed to only a small part of total bias and can be ignored (
Table 4). In addition, the error components also showed seasonal dependence. Generally speaking, it is apparent that the values of total bias are lower in the summer than in the winter. Particularly, in the winter, higher miss bias was found, and missed precipitation was the dominant source of errors. This phenomenon indicates that the satellite estimates miss a lot of precipitation events in winter, and confirms our aforementioned speculation: the GPM-era satellite precipitation products still exhibit some deficiencies for detecting snowing events.
Figure 8 displays the error characteristics of satellite precipitation estimates with rain rate. Obviously, all precipitation products showed a similar variation of error, with overestimations for light rain and underestimations for heavy rain (
Figure 8a), which is a common error feature of satellite-based retrievals, as documented in previous studies [
73,
74,
75]. This error feature of rain-rate dependency is important for meteorological and hydrological applications, especially for typhoon monitoring and flood forecast, which are sensitive to higher rain rates [
76,
77]. In terms of RRMSE, higher values were found at low rain rates compared to at moderate–high rain rates (
Figure 8b), indicating that current satellite precipitation products need to continue improving the performance at low rain rates. On the other hand, from the results of
Figure 8, we can see how the calibration scheme works in precipitation estimates. It is seen that the IMERG-C elevated the precipitation estimates and GSMaP-Gauge decreased the precipitation values compared to their corresponding uncalibrated precipitation products. This gauge-calibration effectively reduced total bias while making things worse in some cases. For example, IMERG-UC overestimated gauge observations at lower rain rates; however, the bias calibration using GPCC gauge data elevated the precipitation estimates further augmenting the overestimation at lower rain rates. Another case involved the GSMaP-Gauge showing large negative hit bias than GSMaP-MVK in the winter season (
Table 4). Thus, it seems important to calibrate satellite-based precipitation estimates at difference rain rates or seasons in the future.
4.3. Hydrological Evaluation of Satellite Precipitation Estimates
In the previous section, we compared the GPM-era satellite precipitation products against the rain gauge observations; the next step was to evaluate the hydrological utility of these precipitation datasets. In this section, since the streamflow data after 2015 were not available at the Tangnaihai hydrological station, the hydrological evaluation of four satellite precipitation estimates was performed for the whole year of 2015. We also calculated statistical indices of precipitation estimates over the upper Yellow River basin in 2015 (
Table 5). By analyzing this indices, we can conclude that the error characteristics of satellite precipitation during 2005 are consistent with prior comparison results. For example, the CC values of daily IMERG-UC, IMERG-C, GSMaP-MVK, and GSMaP-Gauge estimates were 0.57, 0.61, 0.52, and 0.75 over the upper Yellow River basin during 2005, and 0.67, 0.70, 0.52, and 0.77 over the TP during the periods of April 2014 to March 2017, respectively. For the basin-scale evaluation, statistical values with the basin-averaged data were better than those with the grid-scale evaluation. This is expected because random errors would decrease with spatial scale averaging. Thus, both grid-scale and basin-scale analysis confirm that the performance of satellite precipitation in the upper Yellow River basin was similar to that on the TP.
Next, the VIC model was calibrated and validated with observed precipitation and streamflow for the periods of 2009–2011 and 2012–2014 over the upper Yellow River basin.
Figure 9 shows the CGDPA-simulated and observed streamflow at the daily scale. Comparing the observed and simulated streamflow, the values of NSE and RB were 0.61% and −2.56% during the calibration period, and NSE increased to 0.73 and RB of 0.96% in the validation period. It can be seen that the simulated streamflow generally agrees with observations very well, although overestimation and underestimation of peak floods existed in some cases. Improved results were obtained at the monthly scale for both the calibration and validation periods (NSE of 0.88 and 0.87, respectively).
After the model was benchmarked by the in situ data, the VIC model was then driven by gauge- and satellite-based precipitation datasets for the period from 1 January 2015 to 31 December 2015, without any further adjustment of parameters. The simulated and observed hydrographs are shown in
Figure 10, and the statistical comparisons are summarized in
Table 6. As shown, the CGDPA had a worse performance in 2015, with NSE of 0.41 and a runoff overestimation of 26.39%. The observed mean daily discharge from 2009 to 2014 was 711.61 m
3/s, whereas that in 2015 was 480.82 m
3/s. Differences in hydrological features during the two periods may potentially influence the simulation performance. Using the same parameters enabled us to compare the performance of simulated streamflow from different precipitation inputs. For the GPM-era satellite precipitation products, the GSMaP-Gauge showed the best performance in the streamflow simulation; the IMERG-C took second place; and the two purely satellite-derived estimates demonstrated poor performance due to the large precipitation bias, especially for GSMaP-MVK with 151.97% runoff overestimation at a daily scale. Interestingly, the simulated streamflow with GSMaP-Gauge inputs had lightly better performance than the CGDPA (e.g., 0.53 versus 0.41 for NSE). We considered that gauge corrections involved in the GSMaP-Gauge products remarkably improved the skill of streamflow simulation. For the monthly comparisons, GSMaP-Gauge performed the best again, while the NSE value of IMERG-C reached 0.63. However, the satellite-only products still had unsatisfactory performance with negative NSEs, suggesting they have low hydrological utility for this region.
The simulation accuracy could be improved if the hydrology model was calibrated with different precipitation inputs. Subsequently, we recalibrated the model parameters using each satellite precipitation dataset during 2015. This scenario is also an alternative strategy for hydrological applications in ungauged basins where only satellite precipitation estimates are available [
78,
79]. As shown in
Figure 11, the simulation performances were effectively improved after the model was recalibrated. For example, the daily NSE of IMERG-C significantly increased from 0.18 to 0.63. Furthermore, the RMSE also significantly decreased for all satellite products. As summarized in
Table 6, simulations of IMERG-C and GSMaP-Gauge products had good statistical agreement with observed streamflow at daily and monthly scale. However, the NSE values of both satellite-only products was still below zero at the daily scale, further confirming that the undesirable hydrological utility of these two satellite-only products is mainly due to the unreliable precipitation estimates. The errors existing in these two precipitation datasets was propagated to simulated streamflow, and could not be removed upon model-parameter recalibration. Generally, the recalibration of the model parameters effectively improved the hydrological potential of satellite precipitation, especially for precipitation products with small errors; however, this recalibration approach should be taken with a grain of salt because it may result in unrealistic parameter values in some cases [
80,
81].