An Efﬁcient Downscaling Scheme for High-Resolution Precipitation Estimates over a High Mountainous Watershed

: Satellites are capable of observing precipitation over large areas and are particularly suitable for estimating precipitation in high mountains and poorly gauged regions. However, the coarse resolution and relatively low accuracy of satellites limit their applications. In this study, a downscaling scheme was developed to obtain precipitation estimates with high resolution and high accuracy in the Heihe watershed. Shannon’s entropy, together with a semi-variogram, was applied to establish the optimal precipitation station network. A combination of the random forest (RF) method and the residual correction approach with the established rain gauge network was applied to downscale monthly precipitation products from Integrated Multi-satellitE Retrievals for Global Precipitation Measurement (IMERG). The results indicated that the RF model showed little improvement in the accuracy of IMERG-based precipitation downscaling. Including residual modiﬁcation could improve the results of the RF model. The mean absolute error ( MAE ) and root mean square error ( RMSE ) values decreased by 19% and 21%, respectively, after residual corrections were added to the RF approach. Moreover, we found that enough rain gauge records are necessary for and remain an important component of tuning model performance. The application of more rain gauges improves the performance of the combined RF and residual modiﬁcation methods, with the MAE and RMSE values reduced by 8% and 9%, respectively. Residual correction, together with enough precipitation stations, can effectively enhance the quality of the precipitation patterns and magnitudes obtained in the RF downscaling process. The proposed downscaling scheme is an effective tool for increasing the accuracy and spatial resolution of precipitation ﬁelds in the Heihe watershed. The beneﬁt of this approach is evident, especially over regions with sparse gauge networks. The downscaling scheme used in this study can be performed for arbitrary target domains, especially in complex mountainous areas and poorly gauged and ungauged basins. In addition, the method used in this study to select optimum precipitation sites can be utilized in developing a particular meteorological station network. It was shown that the presented downscaling scheme is capable of satisfactorily downscaling monthly precipitation ﬁelds in the Heihe watershed. Further, investigations may be conducted on a daily time scale with other predictors in the RF downscaling process, such as atmospheric circulation factors.


Introduction
Precipitation is a primary variable associated with earth science, hydrology, climatology, and agriculture [1][2][3][4][5]. The spatial distribution of precipitation is a necessary input for hydrological and ecological models. Accurate and reliable precipitation estimates are crucial for understanding the hydrologic cycle, managing water resources, evaluating hydrological impacts on human life, and forecasting climate trends and variability [6][7][8].
Various approaches have been proposed for estimating distributions of precipitation over the last few decades. These methods are based on mathematical and physical principles determined by using rain gauges, weather radar, satellite sensors, and numerical models. The interpolation method is a common approach used to generate spatial precipitation fields. However, the accuracy of interpolation-based precipitation estimates is significantly dependent on the station network and the degree of the spatial heterogeneity of precipitation [9,10]. It is difficult to gain accurate spatial information on precipitation based on data from stations with sparse coverage and in areas where the spatial heterogeneity of precipitation is high, especially mountainous areas [11]. Numerical models can provide spatial precipitation estimates with physical bases but require large computational resources and have large uncertainties associated with parameters [12,13]. Satellite sensors can detect precipitation information globally and can be applied to high mountains and deserts, but satellite data poorly represent variability in local areas and complex topographical regions due to their low resolutions and deficiencies in retrieval algorithms [14][15][16].
A growing number of multi-sensor blended precipitation products have emerged during the past thirty years. Some previous works focused on comparisons of different remote sensing products at national or regional scales [17][18][19][20]. The Global Precipitation Measurement (GPM) Core Observatory was launched in 2014 to extend the Tropical Rainfall Measuring Mission (TRMM). The Integrated Multi-satellitE Retrievals for GPM (IMERG) provides a new generation of satellite precipitation products that have been widely applied in many fields [21]. Studies have found that the IMERG precipitation products show better performances than other satellite precipitation measurements and reanalysis datasets, such as the TRMM3B42 and ERA-Interim products [22][23][24]. IMERG provides precipitation estimates every 30 min with a spatial resolution of 0.1 • depending on the sensors in the TRMM and GPM eras. However, IMERG precipitation products have relatively coarse spatial resolutions that are not sufficiently fine to capture spatial variations in precipitation in complex local landscapes.
To overcome the inherent limitations of IMERG products in fine-scale applications, downscaling techniques are often required to translate precipitation products from a coarse scale to a regional or catchment scale. Dynamical and statistical downscaling methods are two fundamental downscaling techniques. Compared to the dynamical downscaling procedure, statistical downscaling is relatively computationally efficient and aims to establish the statistical relationships between the predictands and predictors of local and large-scale variables [7,25]. Several statistical downscaling approaches have been implemented to derive precipitation fields with fine resolutions, such as geographic weighted regression (GWR), random forest (RF), and artificial neural networks (ANNs) [26][27][28]. Among these methods, RF is recognized for being accurate and able to deal with small sizes and high-dimensional feature spaces [29,30]. Despite the improvements in the spatio-temporal representation of precipitation obtained by these methods, many studies only downscale the satellite products either with the existing limited meteorological observations or without considering the station values [25][26][27][28][29][30][31]. Observations from rain gauges are ideally used as necessary inputs for downscaling. For complex topographical regions and high mountains, however, the density and spacing of the rain gauges restrict the accuracy of the downscaling methods. Studies have also shown that the marginal effect of introducing auxiliary variables is more obvious for low-density station networks than for high-density station networks [32]. In addition, the establishment of more rain gauges requires considerable financial and technological investments for operational and maintenance costs.
The goal of this paper is to provide monthly precipitation estimates with a high spatial resolution in the Heihe watershed in China by downscaling IMERG products. To obtain reliable precipitation estimates, a method composed of Shannon's entropy and a semi-variogram was used to first establish virtual rain gauges based on the existing meteorological stations and local topographical features. The values of the virtual rainfall gauges were obtained by applying the Weather Research and Forecasting Model (WRF). The RF method was then applied to downscale the IMERG precipitation data from 0.1 • to 0.01 • , combined with virtual and existing rainfall gauges and auxiliary information. The downscaling scheme was expected to reduce the spatial resolution of IMERG data and effectively enhance the quality of precipitation patterns and magnitudes.

Study Area and Data
The Heihe watershed is the second-largest inland river basin in China and falls in the category of an arid region with a total area of approximately 140,000 km 2 ( Figure 1). The river originates in the Qilian Mountains, flows through an irrigated agricultural area, continues northward, and finally joins two lakes in the desert. The elevation ranges Remote Sens. 2021, 13, 234 3 of 18 from 872 m in the north to more than 5000 m in the south above the sea level, with high spatial heterogeneity in altitude. The watershed is divided into three segments: the upper, middle, and lower reaches, with high mountains and plateaus located in the southern area, occupying 33.6% of the whole watershed, and oases and deserts distributed in the middle and low reaches, accounting for 8.9% and 57.5% of the study area, respectively. The climate is mainly controlled by westerlies throughout the year. The precipitation in the watershed shows strong spatial heterogeneity, with an annual mean precipitation varying from less than 50 mm in the low reach to more than 550 mm in the upper reach, and more than 60% of the annual precipitation occurs in the summer months, from May to September [31,33]. The Heihe watershed is poorly gauged, with two stations located in the lower reach, seven stations located in the middle reach and four stations distributed in the upper reach. Eleven stations are located in areas below 2800 m in elevation, two stations are located between elevations of 2800 and 3500 m, and there are no stations found above 3500 m (Table 1).

Study Area and Data
The Heihe watershed is the second-largest inland river basin in China and falls in the category of an arid region with a total area of approximately 140,000 km 2 ( Figure 1). The river originates in the Qilian Mountains, flows through an irrigated agricultural area, continues northward, and finally joins two lakes in the desert. The elevation ranges from 872 m in the north to more than 5000 m in the south above the sea level, with high spatial heterogeneity in altitude. The watershed is divided into three segments: the upper, middle, and lower reaches, with high mountains and plateaus located in the southern area, occupying 33.6% of the whole watershed, and oases and deserts distributed in the middle and low reaches, accounting for 8.9% and 57.5% of the study area, respectively. The climate is mainly controlled by westerlies throughout the year. The precipitation in the watershed shows strong spatial heterogeneity, with an annual mean precipitation varying from less than 50 mm in the low reach to more than 550 mm in the upper reach, and more than 60% of the annual precipitation occurs in the summer months, from May to September [31,33]. The Heihe watershed is poorly gauged, with two stations located in the lower reach, seven stations located in the middle reach and four stations distributed in the upper reach. Eleven stations are located in areas below 2800 m in elevation, two stations are located between elevations of 2800 and 3500 m, and there are no stations found above 3500 m ( Table 1).  The GMP IMERG product supersedes the TRMM product as the next-generation precipitation dataset developed with GPM data through the assimilation of information from multiple MW/IR sensors together with ground observations. The GPM provides global precipitation estimates with a spatial resolution of 0.1 • and a relatively high temporal resolution of 30 min within the range of 60 • N-60 • S [24,34,35]. In the new version of IMERG, an improved algorithm is applied to produce precipitation estimates covering the period from 2000 to the present with a high accuracy. In this research, monthly IMERG V06 data (0.1 • , 2000-present) were downloaded from NASA's data repository [36].
In addition to precipitation itself, other auxiliary predictors are commonly considered during the downscaling process, including elevation, slope, aspect, relief, distance to the sea, and the vegetation index [7,25,37,38]. Elevation data (DEM) were extracted from https://gisgeography.com/free-global -dem-data-sources/, released by the United States government. Slope, aspect and relief data were derived from the DEM using ArcGIS software. The normalized difference vegetation index (NDVI), downloaded from https:// modis.gsfc.nasa.gov/, was used as an auxiliary variable in precipitation downscaling [39].

Designing a Rain Gauge Network Using Shannon's Entropy
To obtain reliable precipitation estimations, a dense rain gauge network is necessary. The main purpose of rain gauge network design is to determine the number of stations and their locations to yield the maximum precipitation information and the associated variations. Certain rain gauge density levels have been recommended for different types of landforms by the World Meteorological Organization [40]; for example, 25 km 2 per station is needed in mountainous areas with heterogeneous precipitation.
We first determined several candidate locations {Z 1 , Z 2 , · · · , Z n } in areas with no meteorological stations, especially in areas of high altitudes, topographically complex regions, no man's land, and areas that are poorly gauged, according to the guidelines of the WMO by using ArcGIS. Shannon's entropy, together with a semi-variogram, was applied to design the optimal rain gauge network.
Entropy is a measure of the information content of events [41] and can be expressed as follows: where p i is the probability of the event k i . For precipitation, H(Z 1 ) measures the average amount of precipitation. The overlapping information could exist in two rain gauge stations. The joint entropy of precipitation from two rain gauges is defined as follows, Equation (2) defines the uncertainty of the marginal information from two stations, Z 1 and Z 2 . Conditional entropy can be used to find the gauge with the lowest reduction in uncertainty or trans-information.
Compute the entropy of every candidate rain gauge in the Heihe watershed and find the rain gauge Z 1 with the highest uncertainty, Find the second most important rain gauge, Z 2 , that has the least common information with the first rain gauge, Z 1 , by using the following equation, Calculate the conditional entropy of these gauges with respect to all other rain gauges (Z 3 , · · · , Z n ) and find the site with the minimum reduction in uncertainty, Z 3 is the third most important rain gauge. Repeat the procedure that the jth important rain gauge satisfies, The monthly precipitation amounts of the sites {Z 1 , Z 2 , · · · , Z n } can be calculated by using the WRF model, which was initialized by the European Centre for Medium-range Weather Forecasts (ECMWF) global atmospheric reanalysis (ERA-Interim). Lastly, the number of virtual rain gauges can be determined based on the existing meteorological stations in the watershed area by comparing the semi-variograms of the two datasets.

Random Forest
The RF is a supervised learning algorithm that is applicable to classifications and regressions [42]. In practical applications, the RF-based downscaling method is rather easy to handle with few parameters and is capable of efficiently establishing nonlinear relationships between precipitation and some covariates across heterogeneous landscapes [38,43]. The RF fits a number of regression trees using a subset of the given data and averages the results of all trees to determine the final outputs to improve the model accuracy and reduce overfitting. The equation of the RF can be outlined as follows, Each tree, T b , is independently grown to its maximum size from the bootstrap sample that contains two-thirds of the grid cells from the training dataset without any pruning. T b splits into different branches by minimizing a cost function, that is, the sum of squares error, between the precipitation values and the predictors. An important feature of the RF is that the trees were trained on random subsets of the data, thus generating different precipitation estimates [44].
Regarding the selection of the proper predictors in the RF downscaling method, the coarse-resolution precipitation values in the central and adjacent grid cells are probably the most important predictors [7]. Atmospheric predictors are included to provide temporal information about the downscaled precipitation values. Topographic covariates, such as elevation and slope, are also important here due to the orographic dependence of precipitation, especially in the upper reaches of the watershed. Topographic data are considered here to maintain the spatial pattern of the downscaled precipitation values. NDVI, latitude and longitude are also included because they contain land-atmosphere coupling, geographical and seasonality information.
By establishing the rain gauge network, the IMERG products can be downscaled using the RF combined with covariates. Since regression methods usually cannot quantify the total variation in a given precipitation pattern, some researchers have adopted a residual correction. Studies have shown that the application of residuals of a regression method could produce better predictions than those obtained using only the regression method [31,[45][46][47][48][49]. To yield the final results, we also added the modified residuals back to the RF results by using kriging combined with the station values. The weights used in the kriging method were based on the Gaussian semi-variogram model of spatial correlation. The downscaling scheme is presented in Figure 2.
T , is independently grown to its maximum size from the bootstrap sample that contains two-thirds of the grid cells from the training dataset without any pruning. b T splits into different branches by minimizing a cost function, that is, the sum of squares error, between the precipitation values and the predictors. An important feature of the RF is that the trees were trained on random subsets of the data, thus generating different precipitation estimates [44].
Regarding the selection of the proper predictors in the RF downscaling method, the coarse-resolution precipitation values in the central and adjacent grid cells are probably the most important predictors [7]. Atmospheric predictors are included to provide temporal information about the downscaled precipitation values. Topographic covariates, such as elevation and slope, are also important here due to the orographic dependence of precipitation, especially in the upper reaches of the watershed. Topographic data are considered here to maintain the spatial pattern of the downscaled precipitation values. NDVI, latitude and longitude are also included because they contain land-atmosphere coupling, geographical and seasonality information.
By establishing the rain gauge network, the IMERG products can be downscaled using the RF combined with covariates. Since regression methods usually cannot quantify the total variation in a given precipitation pattern, some researchers have adopted a residual correction. Studies have shown that the application of residuals of a regression method could produce better predictions than those obtained using only the regression method [31,[45][46][47][48][49]. To yield the final results, we also added the modified residuals back to the RF results by using kriging combined with the station values. The weights used in the kriging method were based on the Gaussian semi-variogram model of spatial correlation. The downscaling scheme is presented in Figure 2. The accuracy of the downscaling results was assessed by cross-validation of both the existing meteorological stations and the virtual rain gauges. In this approach, precipitation is estimated at a given gauge location using all observations except the one used in The accuracy of the downscaling results was assessed by cross-validation of both the existing meteorological stations and the virtual rain gauges. In this approach, precipitation is estimated at a given gauge location using all observations except the one used in the downscaling process [32]. Mean absolute error (MAE) and root mean square error (RMSE) are two common metrics used to measure overall accuracy and reflect extreme outliers, respectively, and were used to compare the different methods in this study. Their equations are given as follows: where o k is the true value at the kth location and o k is the estimated value.

Results
By combining Shannon's entropy with the semi-variogram, the optimal virtual rain gauge network was established (Figure 3), and the monthly precipitation at each gauge was obtained by using WRF. According to the station values of the mean annual precipitation over the past thirty years, a Gaussian semi-variogram model was used and this model was once again applied to fit the relation of the entropy and the optimal stations combination. Finally, fifteen rain gauge stations were selected in this study. The parameterized physical processes in WRF include the Kessler scheme for microphysics, the Mellor Yamada Janjic (MYJ) scheme for the planetary boundary layer, the Monin-Obukhov surface-layer scheme, the rapid radiative transfer model longwave radiation scheme, the Dudhia shortwave radiation scheme, the Kain-Fritsch cumulus scheme, the Noah scheme for the land surface model and the model default terrain information [33]. The locations and estimated monthly precipitation amounts of the 15 virtual rain gauges are listed in Table 2.
Remote Sens. 2021, 13, x FOR PEER REVIEW     An RF-based downscaling scheme was used to estimate monthly precipitation in the Heihe watershed. First, the impact of adding the residuals back into the RF results was investigated. Next, the performance of the established rain gauge network was evaluated vs. using the only 13 existing meteorological stations. Here, we only present results for July and December 2018 for the two examples. Figure 4 shows the spatial patterns of the GPM IMERG V06B satellite precipitation product and three downscaling results using the RF, the RF plus a residual correction using the 13 meteorological stations (RF_Kri13), and the RF plus a residual correction using a total of 28 stations (RF_Kri28) in July. The IMERG product at 0.1 • exhibited a localized pattern with heavy precipitation upstream and a few areas of large precipitation surrounded by low precipitation in the middle and lower reaches. Except for in the northwest region of the watershed, the original IMERG-based precipitation values tended to overestimate precipitation in most parts of the region. The maximum precipitation value obtained from the IMERG product was located in the southeast region of the watershed with a value of 223.1 mm, while the smallest value occurred in the northwest part with a value of 21.8 mm. The downscaled precipitation captured the pattern but large differences existed among some local areas. The residual correction process modified the precipitation amounts, especially in the lower reach (Figure 4c,d). This indicated that the residual correction had a positive effect on the RF-based downscaling method. Compared to the station values, the RF_Kri28 method showed superior performance compared to the RF_Kri13 method. Different spatial patterns were observed in Figure 4c,d, especially in some locations in the middle and lower reaches of the watershed. The virtual rain stations ensured the accuracy of the final precipitation fields, and this improvement was more obvious in the northwest region of the watershed. Furthermore, the scatter correlation plot for the observed and the downscaling results in July ( Figure 5) also suggested that RF_Kri28 estimated the precipitation amount quite reliably. The overestimations of IMERG were observed for most of the stations. The results of the RF method had not been significantly improved compared with the original IMERG products. The methods based on residual correction performed better than the RF method, and RF_Kri28 with more station values yielded the best results.
The performances of the methods in December are displayed in Figure 6. The IMERG product seemed to underestimate the precipitation in almost the whole watershed compared to the values obtained from the stations. There was no rain in most of the Heihe watershed according to the IMERG product. The RF-based downscaling method (Figure 6b) modified some local details by introducing auxiliary information. The residual correction methods applied after the RF seemed to yield improved precipitation fields. We can see that the RF_Kri28 method performed better than the RF_Kri13 method at 28 gauge locations (Figure 6c,d). The RF_Kri28 method provided more reliable estimates, especially for gauge-sparse areas. In the Heihe watershed, the downscaling method conducted with enough stations performed better than the RF. Clearly, gauge density is an important factor in determining the accuracy of different methods. Figure 7 illustrates the relationship between the gauge values and the downscaled precipitation datasets. Underestimations were both observed for IMERG and the RF method, and to some degree the residual correction process reduced these biases. RF_Kri28 had shown the best agreement with gauge observations (Figure 7d).   The performances of the methods in December are displayed in Figure 6. IMERG product seemed to underestimate the precipitation in almost the whole wa shed compared to the values obtained from the stations. There was no rain in most of Heihe watershed according to the IMERG product. The RF-based downscaling met (Figure 6b) modified some local details by introducing auxiliary information. The re ual correction methods applied after the RF seemed to yield improved precipita fields. We can see that the RF_Kri28 method performed better than the RF_Kri13 met at 28 gauge locations (Figure 6c,d). The RF_Kri28 method provided more reliable mates, especially for gauge-sparse areas. In the Heihe watershed, the downsca method conducted with enough stations performed better than the RF. Clearly, ga density is an important factor in determining the accuracy of different methods. Figu illustrates the relationship between the gauge values and the downscaled precipita datasets. Underestimations were both observed for IMERG and the RF method, an some degree the residual correction process reduced these biases. RF_Kri28 had sho the best agreement with gauge observations (Figure 7d). The errors presented in Table 3 demonstrate that although the RF approach can provide a precipitation map with a fine resolution, there are limits to the accuracy of the results. In summer months, such as May, July, August and September, the values of MAE and RMSE are slightly larger than those obtained using the IMERG product, possibly due to the lack of other important covariates in the downscaling process, which will be investigated in the future. The application of residuals reduced the values of MAE and RMSE with respect to the RF results. The errors in summer months (from May to September) improved notably, with the mean value of MAE decreasing by 19%, and that of RMSE decreasing by 22%. For the twelve months, the mean values of MAE and RMSE decreased by 15% and 16%, respectively. Residual correction provided obvious reductions in the MAE and RMSE values. The introduction of virtual rain gauges further improved the accuracy of the downscaling fields. For the RF_Kri28 method, the mean values of MAE and RMSE for the twelve months decreased by 8% and 9%, respectively, compared with those obtained in the RF_Kri13 method.   The errors presented in Table 3 demonstrate that although the RF approach c provide a precipitation map with a fine resolution, there are limits to the accuracy of t results. In summer months, such as May, July, August and September, the values of MA and RMSE are slightly larger than those obtained using the IMERG product, possibly d to the lack of other important covariates in the downscaling process, which will be vestigated in the future. The application of residuals reduced the values of MAE a RMSE with respect to the RF results. The errors in summer months (from May to Se tember) improved notably, with the mean value of MAE decreasing by 19%, and that RMSE decreasing by 22%. For the twelve months, the mean values of MAE and RM decreased by 15% and 16%, respectively. Residual correction provided obvious redu tions in the MAE and RMSE values. The introduction of virtual rain gauges further i proved the accuracy of the downscaling fields. For the RF_Kri28 method, the mean v ues of MAE and RMSE for the twelve months decreased by 8% and 9%, respective compared with those obtained in the RF_Kri13 method.    We also compared the downscaled results with the values obtained from the meteorological stations by applying a cross-validation process. Figure 8 displays the area mean absolute bias values of the RF, RF_Kri13, and RF_Kri28 methods for different months over the whole watershed. Compared to those obtained using the RF, significant decreases in the errors can be observed, especially in summer months (from May to September). The RF_Kri28 method achieved better performance than the other methods. The residual correction combined with the application of virtual rain gauges ensured the accuracy of the final precipitation fields. mean absolute bias values of the RF, RF_Kri13, and RF_Kri28 methods for different months over the whole watershed. Compared to those obtained using the RF, significant decreases in the errors can be observed, especially in summer months (from May to September). The RF_Kri28 method achieved better performance than the other methods. The residual correction combined with the application of virtual rain gauges ensured the accuracy of the final precipitation fields.

Discussion
Satellite-based precipitation estimates can provide precipitation information with wide spatial coverage, particularly over complex mountainous terrain and sparsely gauged regions. Precipitation data with high accuracy and high spatial resolution play significant roles in forcing local and global climate models and hydrological and ecological models. Studies have shown that the performance of IMERG products is better than those of some other satellite-based precipitation datasets [25,50]. However, in some basins and localized areas, the IMERG products are still too coarse to be applied. Integrating station observations with a single satellite-based precipitation product is a common approach for obtaining high spatial resolution precipitation. Studies have indicated that the use of residual correction methods that use station values can generate better precipitation estimates than regression-based downscaling methods can alone [31,51]. Although scholars have performed much research on the downscaling of satellite products by using geographical and topographical variables and station observations, most of these studies depended on the limited number of existing stations, and the accuracy of the downscaled precipitation products was not significantly improved, especially over high mountainous areas with poor gauged networks.
The main goal of this study was to obtain precipitation estimates with high accuracy and high spatial resolution in a watershed with highly heterogeneous landforms by proposing a downscaling-integration scheme. As denoted, station networks are of primary importance for the accuracy of precipitation estimations [52,53]. A combination of the RF method and the correcting of residuals by using an optimal rain gauge network was thus developed in this study. The optimal station network was designed by using Shannon's entropy and semi-variograms since entropy can detect the uncertainty information contained in a variable and semi-variograms can be used to determine the num-

Discussion
Satellite-based precipitation estimates can provide precipitation information with wide spatial coverage, particularly over complex mountainous terrain and sparsely gauged regions. Precipitation data with high accuracy and high spatial resolution play significant roles in forcing local and global climate models and hydrological and ecological models. Studies have shown that the performance of IMERG products is better than those of some other satellite-based precipitation datasets [25,50]. However, in some basins and localized areas, the IMERG products are still too coarse to be applied. Integrating station observations with a single satellite-based precipitation product is a common approach for obtaining high spatial resolution precipitation. Studies have indicated that the use of residual correction methods that use station values can generate better precipitation estimates than regression-based downscaling methods can alone [31,51]. Although scholars have performed much research on the downscaling of satellite products by using geographical and topographical variables and station observations, most of these studies depended on the limited number of existing stations, and the accuracy of the downscaled precipitation products was not significantly improved, especially over high mountainous areas with poor gauged networks.
The main goal of this study was to obtain precipitation estimates with high accuracy and high spatial resolution in a watershed with highly heterogeneous landforms by proposing a downscaling-integration scheme. As denoted, station networks are of primary importance for the accuracy of precipitation estimations [52,53]. A combination of the RF method and the correcting of residuals by using an optimal rain gauge network was thus developed in this study. The optimal station network was designed by using Shannon's entropy and semi-variograms since entropy can detect the uncertainty information contained in a variable and semi-variograms can be used to determine the number of stations. The precipitation values at the established stations were simulated by using WRF based on the research by Pan [33]. Based on the 13 existing stations and the 15 established virtual rain gauges as well as some covariates, the final downscaling results were obtained by using the RF.
We investigated the performance of the IMERG products in the Heihe watershed. The results in Figures 4-7 show that for the original IMERG precipitation datasets, overestimations were observed in almost the whole watershed in July, while underestimations were found in the whole watershed in December, indicating the IMERG products may overestimate precipitation in summer months and underestimate precipitation in winter months. The RF-downscaled precipitation estimates showed similar spatial patterns with those of the IMERG products in July and showed a different but detailed pattern in December (Figure 4b or Figure 6b). However, the accuracy of the RF method was not satisfactory according to the cross-validation process (Table 2), which may be due to the selection of covariates. More predictors will be investigated in future research. The accuracy was obviously improved by applying the residual correction. This finding was consistent with some previous studies that also found that residual modification played a critical role in regression-based estimation methods [25,[54][55][56]. The accuracy improved by 19% and 21% according to the MAE and RMSE values, respectively, in this research. We then verified whether the accuracy of the downscaling model could be improved with consideration of virtual rain gauges. The results in Figures 4-7 show that additional station values further improved the accuracy of the downscaling results. In terms of the MAE and RMSE values, the errors were reduced by 8% and 9%, respectively, compared to those obtained using observations from the 13 existing stations, and decreased by 22% and 23%, respectively, compared with those obtained via the RF-based downscaling results. This indicated that enough stations were necessary for the downscaling process.
In addition, the statistical results shown in Table 2 and the plots in Figure 8 also indicated that the proposed approach was superior to the other methods. This improvement was attributed to the combined effects of the residual correction and the optimal rain gauge network. However, the established rain gauge network in this study was based on the mean annual precipitation over the past thirty years. Improved results could be obtained by designing the station network in terms of the characteristics of the monthly precipitation. Furthermore, the downscaling scheme would be optimized further by considering more covariates and by adjusting the parameters to make the model more suitable for our study.

Conclusions
In this study, a downscaling-integration scheme was developed to generate improved high-accuracy and high-resolution monthly precipitation fields in the Heihe watershed. Shannon's entropy combined with a semi-variogram was first applied to establish the optimal rain gauge network in the study region. The monthly precipitation amounts at the candidate rain gauges were obtained by using WRF with some suitable localized parameters. Then, the RF was applied to downscale the IMERG precipitation products from a resolution of 0.1 • to 0.01 • by considering some auxiliary variables, including elevation, slope, aspect, relief, distance to the sea, and the vegetation index, coarse-resolution precipitation values in the central grid cell, and coarse resolution precipitation values at adjacent grid cells. The residuals of the RF results were modified by using the station values from the optimal rain gauge network. The final precipitation estimates were obtained by applying the residual correction to the results of the RF downscaling.
The method was carried out over twelve months in 2018. The performance of the downscaling results based on the RF, RF plus a residual correction using existing 13 stations (RF_Kri13), and RF plus a residual correction using 28 optimal rain gauges (RF_Kri28) were assessed by a cross validation analysis. Studies showed that the original IMERG precipitation products tend to overestimate precipitation in most parts of the watershed in July, and tend to underestimate precipitation in December. The RF approach enhanced the spatial resolution of the precipitation fields without a significant increase in accuracy. A comparison with the station values showed that the precipitation estimates at a high resolution of 0.01 • obtained based on the RF_Kri13 and RF_Kri28 methods were better than those obtained from the original IMERG and RF method; the MAE and RMSE values decreased by 19% and 22%, respectively, indicating that the residual correction was beneficial for the RF-based downscaling method. The RF_Kri28 method generated better precipitation estimates than did the RF_Kri13 method, with the values of MAE and RMSE decreasing by 8% and 9%, respectively, demonstrating that the use of a dense rain gauge network further improved the accuracy of the precipitation estimates. The improvement of the final precipitation fields resulted from the combination of the methods of residual correction and the virtual rain gauge establishment.
The benefit of this approach is evident, especially over regions with sparse gauge networks. The downscaling scheme used in this study can be performed for arbitrary target domains, especially in complex mountainous areas and poorly gauged and ungauged basins. In addition, the method used in this study to select optimum precipitation sites can be utilized in developing a particular meteorological station network. It was shown that the presented downscaling scheme is capable of satisfactorily downscaling monthly precipitation fields in the Heihe watershed. Further, investigations may be conducted on a daily time scale with other predictors in the RF downscaling process, such as atmospheric circulation factors.