The Development of a Two-Step Merging and Downscaling Method for Satellite Precipitation Products

Low accuracy and coarse spatial resolution are the two main drawbacks of satellite precipitation products. Therefore, calibration and downscaling are necessary before these products are applied. This study proposes a two-step framework to improve the accuracy of satellite precipitation estimates. The first step is data merging based on optimum interpolation (OI), and the second step is downscaling based on geographically weighted regression (GWR); therefore, the framework is called OI-GWR. An Integrated Multi-satellitE Retrievals for Global Precipitation Measurement (GPM) (IMERG) product is used to demonstrate the effectiveness of OI-GWR in the Tianshan Mountains, China. First, the original IMERG precipitation data (OIMERG) are merged with rain gauge data using the OI method to produce corrected IMERG precipitation data (CIMERG). Then, using CIMERG as the first guess and the normalized difference vegetation index (NDVI) as the auxiliary variable, GWR is utilized for spatial downscaling. The two-step OI-GWR method is compared with several traditional methods, including GWR downscaling (Ori_GWR) and spline interpolation. The cross-validation results show that (1) the OI method noticeably improves the accuracy of OIMERG, and (2) the 1-km downscaled data obtained using OI-GWR are much better than those obtained from Ori_GWR, spline interpolation, and OIMERG. The proposed OI-GWR method can contribute to the development of high-resolution and high-accuracy regional precipitation datasets. However, it should be noted that the method proposed in this study cannot be applied in regions without any meteorological stations. In addition, further efforts will be needed to achieve dailyor hourly-scale downscaling of precipitation.


Introduction
Precipitation is the main component of the global water cycle and plays a critical role in Earth's energy balance [1,2]. Therefore, accurate information on the spatial and temporal distribution of precipitation is essential to improve our understanding of the Earth system and to better predict weather and climate conditions and natural disasters. Ground observations are the most direct source of precipitation data, but most stations are unevenly distributed in low-altitude zones, which makes it difficult to capture the full distribution of large-scale precipitation. In contrast, satellite-retrieved precipitation products have the unique advantages of global coverage and spatiotemporal continuity [3,4] and have consequently promoted a more complete understanding of the patterns of and changes in regional and global precipitation.
However, because of instrument limitations and imperfect retrieval algorithms, satellite precipitation products have drawbacks in terms of spatial resolution and data precision [5][6][7][8][9]. At present, because of the coarse spatial resolution of satellite precipitation products, their application in hydrological and climatic models at the watershed scale is restricted. For this reason, many researchers have focused on developing statistical downscaling methods for satellite or reanalysis precipitation products [10][11][12]. These methods usually involve building a relationship between coarse-resolution precipitation data and high-resolution variables to improve the spatial resolution of satellite precipitation data. For example, Immerzeel et al. [13] established an exponential regression (ER) model by integrating 1-km normalized difference vegetation index (NDVI) data with Tropical Rainfall Measuring Mission (TRMM) 3B43 precipitation data and obtained high-resolution annual precipitation data over the Iberian Peninsula. Based on the method of Immerzeel, Jia et al. [14] established a functional relationship between 3B43 precipitation data and other variables (i.e., altitude and NDVI) using multiple linear regression (MLR) and obtained downscaled annual data at a 1 km resolution for the Qaidam Basin in China. Duan et al. [15] developed a further modified downscaling algorithm by introducing calibration methods based on geographic difference analysis (GDA) and geographic ratio analysis (GRA) and obtained 1-km monthly precipitation data over the Tana Lake Basin in Africa and the coast of the Caspian Sea in Asia. Zhang et al. [16] applied the abovementioned methods in Xinjiang, China, and obtained 1-km annual precipitation data. Considering the spatial variations exhibited by the relationship between precipitation and environmental variables, geographically weighted regression (GWR) has been introduced into precipitation analyses to achieve improved downscaling performance [17][18][19][20]. However, although environmental variables play a vital role in the monthly or yearly downscaling of precipitation, they have limited applicability in daily and hourly downscaling, which is more strongly reliant on cloud properties. For example, Sharifi et al. [21] obtained 1-km daily precipitation data in northeast Austria using MLR, artificial neural networks (ANNs) and spline interpolation based on 1-km cloud optical thickness (COT), cloud effective radius (CER), and cloud water path (CWP) data. Ma et al. [22] obtained 1-km hourly precipitation data in the southeast coast region of China based on COT, CER, and cloud top height (CTH) data from Himawari 8.
Traditional downscaling methods can improve the spatial resolution of satellite precipitation data. Many studies have shown that the accuracy of the satellite precipitation products is the most important factor affecting the quality of the downscaled estimates [13,14,17] if the environmental variables can satisfactorily reproduce the pattern of the satellite precipitation data. However, all previous downscaling methods have been applied to original satellite precipitation data, which contain large uncertainties that limit the accuracy of the downscaled precipitation estimates. Correcting the original satellite products before applying them in downscaling analyses can potentially contribute to the improvement of the downscaled precipitation estimates. Nevertheless, to our knowledge, no such study has previously been performed.
To address this gap, this study proposes a two-step merging and downscaling framework called OI-GWR to improve the accuracy of downscaled precipitation estimates. The merging procedure is based on optimum interpolation (OI), and the downscaling procedure is based on GWR. The remaining sections of this paper are organized as follows: Section 2 introduces an overview of the study area; Section 3 provides detailed information about the data and methods; Section 4 reports the results of OI and GWR; and finally, a discussion and conclusions are presented in Sections 5 and 6, respectively. The method proposed in this study will contribute to the production of high-quality and high-resolution regional gridded precipitation datasets. In particular, the method can serve as a useful reference for the development of grid data at the daily or hourly scale.

Study Area
The Tianshan Mountains are the mountain system that is the farthest in the world from any ocean. They are composed of a series of mountain ranges, intermountain basins, valleys, and piedmont plains, with an area of approximately 570,000 km 2 and an average altitude of 4000 m [4]. The precipitation distribution shows large temporal and spatial variations, with most precipitation occurring in summer and little occurring in winter. The Tianshan Mountains are also known as the water tower of central Asia. Approximately 65% of the rivers in Xinjiang originate from this region. On northern slopes, the annual average precipitation ranges between 500 and 700 mm, whereas the annual precipitation over western windward slopes can reach 1000 mm [23][24][25].

Integrated Multi-satellitE Retrievals for Global Precipitation Measurement (GPM) (IMERG) Satellite Precipitation Products
The Global Precipitation Measurement (GPM) mission is a satellite precipitation measurement project initiated by the National Aeronautics and Space Administration (NASA) and the Japan Aerospace Exploration Agency (JAXA). The goal of the GPM mission is to provide new-generation global satellite precipitation products with high precision and resolution. The GPM mission has an extended TRMM sensing load and an enhanced capacity for precipitation detection. The dual-frequency radar carried by the GPM Core Observatory (GPMCO) operates in the Ku and Ka bands. In particular, in the Ka band, it can operate in the high-sensitivity interleaved sampling mode. Meanwhile, the microwave radiometer of the GPMCO operates in four bands with higher frequencies than that of the TRMM Microwave Imager, which increases the observation capacity for light and solid precipitation. A comparison between the Integrated Multisatellite Retrievals for GPM (IMERG) products and other commonly used satellite precipitation datasets in Xinjiang has shown that IMERG exhibits the best performance [26][27][28]. Therefore, the IMERG monthly precipitation data were utilized as the initial data in this study.
IMERG can provide quasi-global precipitation data with a temporal resolution of 30 min and a spatial resolution of 0.1 • × 0.1 • . According to the calibration methods and data sources used for these precipitation data, the IMERG products can be divided into three types, namely, "Early-run", "Late-run", and "Final-run" products. Among them, the "Early-run" and "Late-run" products are quasi-real-time data in the sense that they are released 4 and 12 h after the observations, respectively, while the "Final-run" products are post-real-time data with a time lag of 3.5 months. These products are available at the Precipitation Measurement Missions (PMM) website (https://pmm.nasa.gov/dataaccess/downloads/gpm). Specifically, the V06B IMERG "Final-run" product was selected for use in this study because this product is subjected to gauge adjustment using monthly observation data from the Global Precipitation Climatology Centre (GPCC) and offers higher precision than the "Early-run" or "Late-run" products. There are only 9 Global Telecommunication System (GTS) stations used by the GPCC in the Tianshan Mountains; these stations account for a very small proportion (<1%) of the total AWSs in the region (1074) and were excluded in this study to ensure the independence of the precipitation evaluation. Because the AWSs in the study area cannot measure snowfall during the cold season, the study period was restricted to the warm seasons (May to September) from 2014 to 2018.

Observed Precipitation
Daily precipitation data were collected from 1065 AWSs in the Tianshan region and accumulated to the monthly scale. The distribution of the stations is shown in Figure 1. The time range of these precipitation observations was consistent with that of the IMERG precipitation data. To ensure the independence of model training and validation, the 9 GTS stations in the study area were excluded ( Figure 1). The observed data were collected by the Information Center of the Xinjiang Meteorological

Environmental Variables
The introduction of environmental variables into a downscaling analysis can improve the quality of satellite-retrieved precipitation data [13][14][15][16][17][18][19][20]. Therefore, environmental variables that are closely related to satellite precipitation data, such as the slope (SLP), aspect (ASP), curvature (CVT), hillshade (HSHD) [29,30], topographic wetness index (TWI) [31], and NDVI, were selected in this study. The NDVI data were obtained from the 1-km MOD13A3 monthly average vegetation index provided by the MODerate resolution Imaging Spectroradiometer (MODIS), and digital elevation model (DEM) data were taken from the Advanced Spaceborne Thermal Emission and Reflection Radiometer Global Digital Elevation Model (ASTER GDEM), with a native resolution of 30 m. To maintain consistency with the NDVI data, the resolution was resampled to 1 km using the pixel averaging method, that is, the data from all high-resolution pixels within a given coarse-resolution pixel were averaged to obtain the corresponding coarse-resolution estimate. The topographic variables were obtained from the DEM data using Geographic Information System (GIS) software.
For the selected environmental variables, such as the NDVI, DEM, ASP, SLP, HSHD, TRI, and CVT, experiments were performed on each of the variables individually and on various combinations of variables. Moreover, the variance inflation factor (VIF) method was utilized to prevent multicollinearity. The VIF is a measure of the severity of multicollinearity in an MLR model, where multicollinearity refers to linear correlations between independent variables. The VIF is calculated when filtering variables. When the VIF value is closer to one, the multicollinearity is weaker, and vice versa. Specifically, when performing the experiments, variables were added to the existing variable group one by one. If adding a particular variable resulted in multicollinearity according to the VIF, that variable was deleted from the variable group. The variables were then introduced into the GWR model separately or in various combinations to obtain the downscaling outcomes. After a comparison of the results, the NDVI was found to have the best downscaling effect and thus was selected as the final explanatory variable for further study.

Environmental Variables
The introduction of environmental variables into a downscaling analysis can improve the quality of satellite-retrieved precipitation data [13][14][15][16][17][18][19][20]. Therefore, environmental variables that are closely related to satellite precipitation data, such as the slope (SLP), aspect (ASP), curvature (CVT), hillshade (HSHD) [29,30], topographic wetness index (TWI) [31], and NDVI, were selected in this study. The NDVI data were obtained from the 1-km MOD13A3 monthly average vegetation index provided by the MODerate resolution Imaging Spectroradiometer (MODIS), and digital elevation model (DEM) data were taken from the Advanced Spaceborne Thermal Emission and Reflection Radiometer Global Digital Elevation Model (ASTER GDEM), with a native resolution of 30 m. To maintain consistency with the NDVI data, the resolution was resampled to 1 km using the pixel averaging method, that is, the data from all high-resolution pixels within a given coarse-resolution pixel were averaged to obtain the corresponding coarse-resolution estimate. The topographic variables were obtained from the DEM data using Geographic Information System (GIS) software.
For the selected environmental variables, such as the NDVI, DEM, ASP, SLP, HSHD, TRI, and CVT, experiments were performed on each of the variables individually and on various combinations of variables. Moreover, the variance inflation factor (VIF) method was utilized to prevent multicollinearity. The VIF is a measure of the severity of multicollinearity in an MLR model, where multicollinearity refers to linear correlations between independent variables. The VIF is calculated when filtering variables. When the VIF value is closer to one, the multicollinearity is weaker, and vice versa. Specifically, when performing the experiments, variables were added to the existing variable group one by one. If adding a particular variable resulted in multicollinearity according to the VIF, that variable was deleted from the variable group. The variables were then introduced into the GWR model separately or in various combinations to obtain the downscaling outcomes. After a comparison of the results, the NDVI was found to have the best downscaling effect and thus was selected as the final explanatory variable for further study.

Overall Flow Chart of the Study
In this study, monthly IMERG precipitation data were used for downscaling. As shown in Figure 2, first, the original IMERG precipitation data (OIMERG) and the observed precipitation data (OBS) were merged using OI to obtain 10-km OI-corrected IMERG precipitation data (CIMERG). The OI technique was originally developed by Eliassen [32] and Gandin [33]. In this study, OI was used to calibrate and adjust the satellite precipitation data based on OBS; hence, the corrected results possessed the combined advantages of satellite precipitation data and OBS (see Section 3.2.2 for details). Then, since it was found that the NDVI showed the best performance among all considered environmental variables (DEM, SLP, ASP, CVT, HSHD, TWI, and NDVI) in the downscaling of precipitation data, we included only the NDVI variable in this study, which was resampled from its original 1 km resolution to a 10 km resolution. With the 10-km NDVI as the independent variable, the spatial downscaling of CIMERG was performed using GWR to obtain a 1-km monthly precipitation dataset (DS_CIMERG) for the Tianshan Mountains. For comparison with previous methods, two additional sets of data were generated and used for further validation: DS_OIMERG, obtained via GWR downscaling with OIMERG (the original uncorrected IMERG precipitation data) as the dependent variable, and DS_Spline, obtained by using the spline interpolation method for OIMERG precipitation downscaling. The steps highlighted in the red dashed box in Figure 2 constitute the process of residual correction, which is necessary for the verification of the downscaling method proposed in this study.

Overall Flow Chart of the Study
In this study, monthly IMERG precipitation data were used for downscaling. As shown in Figure 2, first, the original IMERG precipitation data (OIMERG) and the observed precipitation data (OBS) were merged using OI to obtain 10-km OI-corrected IMERG precipitation data (CIMERG). The OI technique was originally developed by Eliassen [32] and Gandin [33]. In this study, OI was used to calibrate and adjust the satellite precipitation data based on OBS; hence, the corrected results possessed the combined advantages of satellite precipitation data and OBS (see Section 3.2.2 for details). Then, since it was found that the NDVI showed the best performance among all considered environmental variables (DEM, SLP, ASP, CVT, HSHD, TWI, and NDVI) in the downscaling of precipitation data, we included only the NDVI variable in this study, which was resampled from its original 1 km resolution to a 10 km resolution. With the 10-km NDVI as the independent variable, the spatial downscaling of CIMERG was performed using GWR to obtain a 1-km monthly precipitation dataset (DS_CIMERG) for the Tianshan Mountains. For comparison with previous methods, two additional sets of data were generated and used for further validation: DS_OIMERG, obtained via GWR downscaling with OIMERG (the original uncorrected IMERG precipitation data) as the dependent variable, and DS_Spline, obtained by using the spline interpolation method for OIMERG precipitation downscaling. The steps highlighted in the red dashed box in Figure 2 constitute the process of residual correction, which is necessary for the verification of the downscaling method proposed in this study. The flow chart of this study. The red dashed box indicates the residual correction process, and the blue boxes represent the three categories of downscaling estimation data. OIMERG, original Integrated Multi-satellitE Retrievals for GPM; OBS, observed precipitation data; OI, optimum interpolation; NDVI, normalized difference vegetation index; CIMERG, OI-corrected IMERG precipitation data; GWR, geographically weighted regression; DS_CIMERG, obtained via GWR downscaling with CIMERG; DS_OIMERG, obtained via GWR downscaling with OIMERG; DS_Spline, obtained by using the spline interpolation method for OIMERG precipitation downscaling. Figure 2. The flow chart of this study. The red dashed box indicates the residual correction process, and the blue boxes represent the three categories of downscaling estimation data. OIMERG, original Integrated Multi-satellitE Retrievals for GPM; OBS, observed precipitation data; OI, optimum interpolation; NDVI, normalized difference vegetation index; CIMERG, OI-corrected IMERG precipitation data; GWR, geographically weighted regression; DS_CIMERG, obtained via GWR downscaling with CIMERG; DS_OIMERG, obtained via GWR downscaling with OIMERG; DS_Spline, obtained by using the spline interpolation method for OIMERG precipitation downscaling.

Optimum Interpolation (OI)
OI analysis requires an initial estimation field, such as a set of gridded satellite precipitation data. By calculating the error weight function of the initial estimation field and the observation field point by point, the target grid points for analysis can be corrected. Thus, when developing OI-based data merging algorithms, the key is to quantify the error structure of the initial estimation field and the observation field. Unlike other data-merging methods, OI considers not only the autocorrelation of various errors but also the correlation between different observations. Moreover, OI involves solving for the optimal value within a certain range of each analysis point and thus is particularly suitable for the analysis of single variables with large spatiotemporal variability, such as precipitation [34]. In the OI analysis conducted in this study, the OIMERG precipitation data were used as the initial estimation field, and the station precipitation observations were used as the observation field. The final analysis result for the precipitation value (A k ) at each grid point is equivalent to the first guess (F k ) at this grid point plus the deviation between the observed value and the initial estimated value at the grid point. This deviation is obtained through weighted estimation based on the deviations between the known observed values (O i ) and initial estimated values (F i ) from n grid points within a certain range, and it represents the maximum distance correlated with the target grid point. The formula is as follows: where k is the grid point to be analyzed, i is an index representing the "valid grid boxes" (boxes in the satellite precipitation grid containing at least one gauge station), and W i is a weight coefficient assigned to the deviation between the observed value and the initial estimated value in the ith grid box during estimation. Note that in an area with a sparse station network, the analysis radius should be continuously adjusted to ensure that a sufficient number of valid grid boxes can be searched, from which several valid grid boxes nearest to the target grid point are then selected for inclusion during OI. In this study, the analysis radius was set to 100 km, and the 9 nearest valid grid boxes to the target grid point were selected for OI [35]. In Equation (1), the weight coefficients (W i ) are determined by the variance in the minimum error on the precipitation value (A k ) at the target point: where T K represents the observed value at point k.
Based on the assumptions that the observation field and the initial estimation field are unbiased and that the observation error is not related to the error of the initial estimation field, the weight coefficients W i in Equation (1) can be obtained by solving the following linear equation [34]: where µ f ij represents the co-correlation of the initial estimation field error, µ 0 ij represents the co-correlation of the observation error, and λ i is the ratio between the standard deviation of the observation error (σ o i ) and that of the initial estimation error (µ o ij ) at point i. In OI, the calculation of the W i requires that µ f ij , µ o ij , σ o i , and σ f i are known values, which, in turn, requires the pre-estimation of the observation error and the satellite-retrieved precipitation error as well as the correlation between these errors. Here, the term "pre-estimation" means that these parameters need to be estimated in advance. In this study, this pre-estimation was performed based on a statistical analysis of the sample data within the study period [35]. The weight coefficients (W i ) were determined based on Equation (3), and then, the final precipitation values (A k ) were obtained based on Equation (1).

GWR Downscaling
Since the GWR model was first proposed by Brunsdon et al. [36] in 1996, it has been extensively applied in research on spatial heterogeneity [17][18][19][20]37,38]. The basic idea of GWR is that the relationship between variables varies with changes in spatial location; thus, a regression model can be established by estimating the parameters of the correlated variables and explanatory variables at each given location in the study area. Figure 3 shows the spatial distributions of the intercept, NDVI regression coefficient, and local R 2 obtained via GWR, and these values are in accordance with the definition. These parameters exhibit significant spatial variations. The intercept ranges from -78.7 to 178.4, the NDVI coefficient ranges from -123.3 to 203.4, and the local R 2 ranges from 0.01 to 0.92.
Remote Sens. 2020, 11, x 7 of 22 The weight coefficients (Wi) were determined based on Equation (3), and then, the final precipitation values (Ak) were obtained based on Equation (1).

GWR Downscaling
Since the GWR model was first proposed by Brunsdon et al. [36] in 1996, it has been extensively applied in research on spatial heterogeneity [17][18][19][20][37][38]. The basic idea of GWR is that the relationship between variables varies with changes in spatial location; thus, a regression model can be established by estimating the parameters of the correlated variables and explanatory variables at each given location in the study area. Figure 3 shows the spatial distributions of the intercept, NDVI regression coefficient, and local R 2 obtained via GWR, and these values are in accordance with the definition. These parameters exhibit significant spatial variations. The intercept ranges from -78.7 to 178.4, the NDVI coefficient ranges from -123.3 to 203.4, and the local R 2 ranges from 0.01 to 0.92. In this study, a GWR regression model was established based on the NDVI and the IMERG precipitation data as follows:. In this study, a GWR regression model was established based on the NDVI and the IMERG precipitation data as follows: where Y j is the IMERG precipitation at point j; X ij is the NDVI at point i in the vicinity of point j; β 0 u j , v j and β i u j , v j represent the intercept and slope, respectively, at point j; u j , v j represents the two-dimensional coordinates of point j; and ε j is the residual error. Unlike traditional global regression models, Equation (4) is based on the assumption that the shorter the distance between the observation point and point j is, the greater the influence on point j will be, with the coefficient acting as a damping function that depends on the distance from point j. This damping function can be obtained in accordance with Equation (5): whereβ u j , v j represents the coefficient of point j; X and Y are the independent and dependent variables, respectively; and W u j , v j is a weight matrix. This matrix ensures that the shorter the distance between points i and j is, the greater the weight, and the elements of the matrix can be obtained as follows: where d ij is the distance of point j from the nearby observation point i, and b is a fixed threshold defined in terms of a distance metric. In detail, the following procedures were applied for GWR-based downscaling ( Figure 2).
(1) To effectively establish the precipitation-NDVI model, anomalous NDVI areas corresponding to snow and water bodies were removed from the high-spatial-resolution NDVI data [17,39].
(2) After the removal of outliers, the 1-km NDVI data were aggregated to a resolution of 10 km by means of pixel averaging. Then, a GWR model of the 10-km IMERG data and the 10-km NDVI data was established with the NDVI as the independent variable and the IMERG precipitation data as the dependent variable. By introducing 1-km and 10-km grid point coordinates into the GWR model, the constants and corresponding coefficients for the 1-km and 10-km NDVI were obtained, as shown in Equation (4).
(3) The 10-km NDVI data were entered into the regression model to obtain NDVI-based precipitation predictions with a 10 km resolution (Predicted Precipitation 10 km in Figure 2).
(4) The residual errors between the values predicted by the 10-km resolution model and the original IMERG precipitation values were calculated (Residuals 10 km in Figure 2). The 10-km residuals were then transformed into 1-km residuals through spline interpolation (Residuals 1 km in Figure 2).
(5) The 1-km NDVI data after anomalous data removal were used to force the regression model, thus obtaining 1-km model-predicted precipitation values. Spline interpolation was then applied to fill in the values missing after outlier removal to obtain downscaled data (DS_CIMERG in Figure 2).
(6) The 1-km model-predicted precipitation values and 1-km residual data were summed to obtain post-residual-corrected 1-km downscaled precipitation data.

Validation
Station-measured data are the most direct observations of precipitation. In this study, 10-fold cross-validation was used to validate the precision of the OI outcomes and downscaling products. The observation stations were randomly divided into 10 groups. Nine groups (90%) were selected for the OI-based merging of the IMERG precipitation data with the observation data and GWR-based Remote Sens. 2020, 12, 398 9 of 22 downscaling. The remaining 10% constituted an independent dataset to validate the accuracy of the OI results and downscaling products. This was repeated 10 times to guarantee an independent validation for each station and to ensure the representativeness of the training samples and validation samples. Three statistical indicators, namely, the mean absolute error (MAE), root-mean-square error (RMSE), and correlation coefficient (CC), were utilized to validate the estimated values of the downscaled product against the observed values. The formulas for these indicators are as follows: y i , n is the sample size, and x i and y i are the estimated values and station observations of precipitation, respectively. In addition, a statistical analysis showed that July 2016 was the month with the most precipitation in the Tianshan Mountain area, while September 2017 was the month with the least precipitation. Therefore, in addition to the 10-fold cross-validation, the data from these two months were used to further validate the results of the proposed downscaling method in terms of their spatial distribution.

Consistency Between the NDVI and Satellite Precipitation Data
To determine the delay in the response of the NDVI to precipitation, Table 1 presents the correlation coefficients (CCs) between the NDVI and the observed precipitation at all stations in the study area during the warm seasons from 2014 to 2018. The results showed that the highest correlations were observed with no time lag. Figure 4 shows the distributions of the monthly average NDVI and IMERG precipitation in the study area during the warm season of 2016. According to Figure 4b, the Ili Valley was the area with the most abundant precipitation, while the eastern area received the least precipitation. This is because the Ili Valley is surrounded by mountains to the north, east, and south (Figure 1), which is beneficial for the collection of water vapor from the Atlantic Ocean to the west, resulting in abundant precipitation in the piedmont zone. According to Figure 4a, there was good consistency between the NDVI and IMERG precipitation data. The high resolution of the NDVI revealed detailed spatial variations and contributed to the improvement of the downscaled precipitation estimates.

Error Statistics of DS_Spline, DS_OIMERG, and DS_CIMERG
Based on the improvement in the quality of the satellite precipitation product after OI, the downscaling effect of the GWR method was further validated.

Error Statistics of DS_Spline, DS_OIMERG, and DS_CIMERG
Based on the improvement in the quality of the satellite precipitation product after OI, the downscaling effect of the GWR method was further validated. Figure 6 shows comparisons among the downscaled data obtained with three different downscaling methods for May to September of 2014-2018. DS_CIMERG had the highest CC values in 22 of the 24 months, with lower values than those of DS_Spline only in May and August 2014. On average, the mean CC of DS_CIMERG was 10% higher than that of DS_Spline. Between DS_Spline and DS_OIMERG, the CC values of DS_Spline were higher in 8 months and lower in 16 Figure 7 presents scatter plots of OIMERG, CIMERG, and the three types of downscaled precipitation data against all of the observation data from the study area. After the first and second processing steps (i.e., OI and GWR), the CC was improved by 10% and 2%, respectively. As shown  Figure 7 presents scatter plots of OIMERG, CIMERG, and the three types of downscaled precipitation data against all of the observation data from the study area. After the first and second processing steps (i.e., OI and GWR), the CC was improved by 10% and 2%, respectively. As shown in Figure 7e, DS_OIMERG showed slight improvements compared with OIMERG, while it was less accurate than CIMERG and DS_CIMERG, indicating that direct downscaling based on original satellite precipitation products may have a limited effect in improving the quality of downscaled precipitation estimates.

Overall Assessment of the Two-Step Merging and Downscaling Method
precipitation estimates. Figure 8 shows boxplots of the evaluation metrics for the five precipitation products. A boxplot divides a dataset into four segments based on the maximum, minimum, median, and two quartiles of the data. The middle horizontal line represents the median, which divides the statistical data into two equal parts. As shown in Figure 8, the distributions of the CC, MAE, and RMSE were all uniform and consistent. Among the five datasets, the best performance was observed for DS_CIMERG, whose CC values were more concentrated in the upper region, while the MAE and RMSE values were more concentrated in the lower region. In particular, the minimum and maximum MAE and RMSE were the lowest for DS_CIMERG, followed by CIMERG. The metrics for OIMERG and DS_Spline were basically the same, while those for DS_OIMERG were in agreement with the overall assessment results and slightly better than those for OIMERG (Figure 7). In Figure 8, the small rectangular boxes represent the distributions of the metric values around their averages. DS_CIMERG performed the best among the five products, with CC, MAE, and RMSE values of 0.70, 17.29 mm, and 22.45 mm, respectively. The overall evaluation results showed that the dataset generated using the OI-GWR method proposed in this study was notably superior to the datasets of the other two downscaling products as well as the initial IMERG precipitation data.   Figure 8 shows boxplots of the evaluation metrics for the five precipitation products. A boxplot divides a dataset into four segments based on the maximum, minimum, median, and two quartiles of the data. The middle horizontal line represents the median, which divides the statistical data into two equal parts. As shown in Figure 8, the distributions of the CC, MAE, and RMSE were all uniform and consistent. Among the five datasets, the best performance was observed for DS_CIMERG, whose CC values were more concentrated in the upper region, while the MAE and RMSE values were more concentrated in the lower region. In particular, the minimum and maximum MAE and RMSE were the lowest for DS_CIMERG, followed by CIMERG. The metrics for OIMERG and DS_Spline were basically the same, while those for DS_OIMERG were in agreement with the overall assessment results and slightly better than those for OIMERG (Figure 7). In Figure 8, the small rectangular boxes represent the distributions of the metric values around their averages. DS_CIMERG performed the best among the five products, with CC, MAE, and RMSE values of 0.70, 17.29 mm, and 22.45 mm, respectively. The overall evaluation results showed that the dataset generated using the OI-GWR method proposed in this study was notably superior to the datasets of the other two downscaling products as well as the initial IMERG precipitation data.

Spatial Distributions of Monthly Precipitation
Using  Figure 9; Figure 11. Furthermore, Figure 10; Figure 12 show the detailed distributions of the precipitation datasets in the Ili Valley after image enlargement.

Spatial Distributions of Monthly Precipitation
Using July 2016 (the month with the maximum precipitation) and September 2017 (the month with the minimum precipitation) as examples (Figures 9-12), the spatial distributions of the OBS, OIMERG, CIMERG, DS_OIMERG, DS_Spline, and DS_CIMERG monthly precipitation data are shown in Figure 9; Figure 11. Furthermore, Figure 10; Figure 12 show the detailed distributions of the precipitation datasets in the Ili Valley after image enlargement. Figure 9a shows the observed precipitation distribution in July 2016. The Ili Valley was the region with the most abundant precipitation. The other five gridded estimated precipitation datasets (OIMERG (Figure 9b), CIMERG (Figure 9c), DS_Spline (Figure 9d), DS_CIMERG (Figure 9e), and DS_OIMERG (Figure 9f)) all exhibited the same spatial distribution pattern as that of the OBS dataset. However, OIMERG underestimated the observed precipitation, particularly in the Ili Valley area. In contrast, CIMERG, obtained by combining OIMERG and ground observations using OI, showed a more reasonable precipitation distribution. Figure 10 shows a clearer view of the patterns in this area. Since DS_Spline and DS_OIMERG were generated by downscaling the original IMERG data, their performance was limited by the IMERG precipitation distribution, which underestimated the observed precipitation. By contrast, after OI-GWR processing, DS_CIMERG not only showed noticeably improved accuracy relative to the initial IMERG precipitation data but also exhibited a precipitation distribution that was more detailed and consistent with the observed precipitation, thus demonstrating the improved capabilities of the proposed downscaling method. Figure 11 presents the spatial distributions of the precipitation data for September 2017 (the month with the lowest precipitation). It can be seen that, during this time period, the precipitation was weak throughout the Tianshan Mountain area (Figure 11a). Consistent with the results shown in Figure 9, the DS_CIMERG dataset showed a distribution more similar to the observed precipitation distribution (Figure 11e (Figure 9f)) all exhibited the same spatial distribution pattern as that of the OBS dataset. However, OIMERG underestimated the observed precipitation, particularly in the Ili Valley area. In contrast, CIMERG, obtained by combining OIMERG and ground observations using OI, showed a more reasonable precipitation distribution. Figure 10 shows a clearer view of the patterns in this area. Since DS_Spline and DS_OIMERG were generated by downscaling the original IMERG data, their performance was limited by the IMERG precipitation distribution, which underestimated the observed precipitation. By contrast, after OI-GWR processing, DS_CIMERG not only showed noticeably improved accuracy relative to the initial IMERG precipitation data but also exhibited a precipitation distribution that was more detailed and consistent with the observed precipitation, thus demonstrating the improved capabilities of the proposed downscaling method. Figure 11 presents the spatial distributions of the precipitation data for September 2017 (the month with the lowest precipitation). It can be seen that, during this time period, the precipitation was weak throughout the Tianshan Mountain area (Figure 11a). Consistent with the results shown in Figure 9, the DS_CIMERG dataset showed a distribution more similar to the observed precipitation distribution (Figure 11e), while the distributions of DS_Spline ( Figure 11d) and DS_OIMERG ( Figure  11f) were consistent with that of OIMERG.

Residual Correction
A series of improvements in downscaling methods for satellite precipitation products have been achieved, from the original ER model proposed by Immerzeel et al. [13] and the MLR model proposed by Jia et al. [14] to the GWR approach proposed by Xu et al. [17] and Chen et al. [20]. Residual correction is a key step in both ER and MLR [17]. In this study, residual correction for GWR was analyzed in detail.
Using DS_CIMERG as an example, the residual correction processes for GWR downscaling in July 2016 and September 2017 were analyzed, and the results are shown in Figure 13; Figure 14, respectively. Figure 13a shows the CIMERG data before downscaling. Figure 13b shows the estimated precipitation based on the low-resolution (10 km) regression coefficients (i.e., Predicted Precipitation (10 km) in Figure 2). Figure 13c shows the 10-km residuals obtained by subtracting CIMERG from the 10-km precipitation predictions. Figure 13d shows the 1-km residuals obtained after applying the spline interpolation technique to the 10-km residuals. Figure 13e shows the DS_CIMERG precipitation estimates, which are based on the high-resolution (1 km) regression coefficients. Finally, Figure 13f presents the residual-corrected downscaling results obtained by summing the 1-km residuals and the 1-km precipitation estimates. As shown in this figure, the precipitation estimates ( Figure 13b) obtained with the low-resolution parameters already exhibited high consistency with CIMERG (Figure 13a), and their residual values were generally small, ranging between -10 and 10 mm (Figure 13c), indicating that the low-resolution estimated precipitation data obtained via the GWR method were close to the initial precipitation data. In contrast, the high-resolution precipitation estimates obtained based on the 1-km regression coefficients ( Figure  13e) not only were highly consistent with the initial precipitation distribution but also reflected the detailed structure of the precipitation distribution. The results shown in Figure 14 are similar to those in Figure 13, except that they correspond to a different month. Table 2 summarizes the detailed error statistics. The CC, RMSE, and MAE values after residual correction were worse than those before residual correction. Therefore, the addition of residual correction to the GWR model transferred the errors of the original IMERG precipitation data to the final downscaling outcomes,

Residual Correction
A series of improvements in downscaling methods for satellite precipitation products have been achieved, from the original ER model proposed by Immerzeel et al. [13] and the MLR model proposed by Jia et al. [14] to the GWR approach proposed by Xu et al. [17] and Chen et al. [20]. Residual correction is a key step in both ER and MLR [17]. In this study, residual correction for GWR was analyzed in detail.
Using DS_CIMERG as an example, the residual correction processes for GWR downscaling in July 2016 and September 2017 were analyzed, and the results are shown in Figure 13; Figure 14, respectively. Figure 13a shows the CIMERG data before downscaling. Figure 13b shows the estimated precipitation based on the low-resolution (10 km) regression coefficients (i.e., Predicted Precipitation (10 km) in Figure 2). Figure 13c shows the 10-km residuals obtained by subtracting CIMERG from the 10-km precipitation predictions. Figure 13d shows the 1-km residuals obtained after applying the spline interpolation technique to the 10-km residuals. Figure 13e shows the DS_CIMERG precipitation estimates, which are based on the high-resolution (1 km) regression coefficients. Finally, Figure 13f presents the residual-corrected downscaling results obtained by summing the 1-km residuals and the 1-km precipitation estimates. As shown in this figure, the precipitation estimates ( Figure 13b) obtained with the low-resolution parameters already exhibited high consistency with CIMERG (Figure 13a), and their residual values were generally small, ranging between -10 and 10 mm (Figure 13c), indicating that the low-resolution estimated precipitation data obtained via the GWR method were close to the initial precipitation data. In contrast, the high-resolution precipitation estimates obtained based on the 1-km regression coefficients (Figure 13e) not only were highly consistent with the initial precipitation distribution but also reflected the detailed structure of the precipitation distribution. The results shown in Figure 14 are similar to those in Figure 13, except that they correspond to a different month. Table 2 summarizes the detailed error statistics. The CC, RMSE, and MAE values after residual correction were worse than those before residual correction. Therefore, the addition of residual correction to the GWR model transferred the errors of the original IMERG precipitation data to the final downscaling outcomes, thereby decreasing the reliability of the outcomes, as the residuals were obtained by subtracting the 10-km precipitation predictions from the original IMERG precipitation data.
Remote Sens. 2020, 11, x 18 of 22 thereby decreasing the reliability of the outcomes, as the residuals were obtained by subtracting the 10-km precipitation predictions from the original IMERG precipitation data.   thereby decreasing the reliability of the outcomes, as the residuals were obtained by subtracting the 10-km precipitation predictions from the original IMERG precipitation data.   To further evaluate the results through additional validation tests, the data before and after residual correction from 2010 to 2018 were also processed and evaluated. The results were consistent with those presented in Table 2. The CC, RMSE, and MAE values respectively increased from 0.576, 26.92 mm, and 18.12 mm before residual correction to 0.59, 26.55 mm, and 17.63 mm after residual correction. Therefore, the residual correction step is not necessary in GWR downscaling; this conclusion is consistent with the findings of Xu et al. [17].

Limitations of GWR Downscaling
Previous studies have shown that, in the case that an established model can adequately estimate precipitation, the quality of the initial satellite precipitation data becomes a critical factor in determining the quality of the downscaling results [14,17]. In this study, the proposed two-step OI-GWR method combining data merging and downscaling successfully solved this problem. Nevertheless, some limitations of the GWR method were also found during this research. First, during the process of variable selection for GWR, when the variables with the best global correlation with precipitation were applied for GWR downscaling, the obtained results were not optimal. In addition, the inclusion of more explanatory variables (e.g., DEM, SLP, ASP, CVT, HSHD, and/or TWI) did not yield better results than those achieved using the single NDVI variable. Second, unlike the stepwise regression method, GWR cannot automatically identify and eliminate variables that are not significantly related to the dependent variables, which can be considered a shortcoming of the GWR method itself. Even after stepwise regression was performed for variable screening and the selected variables were input into the GWR model, the desired results were not obtained. Presumably, this can be explained by the fact that stepwise regression is a global regression method, which therefore selects variables from the perspective of global correlation, whereas the GWR method relies on point-by-point regression and therefore focuses on local modeling. For this reason, variables selected on the basis of stepwise regression or correlation analysis may not be suitable for GWR. Accordingly, further investigation will be necessary to improve the process of variable selection for GWR.

Impact of the Gauge Density on OI-GWR
To further validate the performance of OI-GWR and its applicability in sparsely gauged regions, we randomly selected 30% of the rain gauges for model training and used the remaining 70% of the rain gauges for model validation. For this analysis, the study period was extended to 2010-2018. Figure 15 shows that even when using only 30% of the stations for training, a positive correction effect could be obtained. However, since fewer rain gauges were used for training, the improvement was less significant than that achieved using more rain gauges. An overall evaluation showed that after the two steps of correction (i.e., OI and GWR), the CC increased from the initial value of 0.533 to values of 0.567 and 0.59, respectively; the RMSE decreased from the initial value of 27.08 mm to values of 26.84 and 26.55 mm, respectively; and the MAE decreased from the initial value of 18.3 mm to values of 17.76 and 17.63 mm, respectively. Therefore, the OI-GWR method proposed in this study can improve the deficiencies of the previous downscaling method. Since OI-GWR works well even when the size of the training dataset is reduced to 30%, the proposed method has great potential for application in sparsely gauged regions. OI-GWR works well even when the size of the training dataset is reduced to 30%, the proposed method has great potential for application in sparsely gauged regions.

Conclusions
In this study, a two-step merging and downscaling method (OI-GWR) was proposed and used to downscale original IMERG precipitation data for the Tianshan Mountains from a resolution of ~10 to 1 km. First, the original IMERG precipitation data were merged with observed precipitation data using the OI method. Then, downscaling was performed using the GWR method, with the corrected CIMERG precipitation data as the initial values and the NDVI as an auxiliary variable. The performance of OI-GWR was assessed based on rain gauge observations, and the results were compared with those of other downscaling methods. The main conclusions include the following: (1) The OI-based merging of the OIMERG data and the observed precipitation data greatly improved the precision of the original satellite precipitation product. After OI, the CC, RMSE, and MAE values for the OIMERG data were increased from 0.516, 28.83 mm, and 19.73 mm to 0.616, 26.56 mm, and 17.59 mm, respectively.
(2) An assessment of various downscaled datasets showed that the precision of DS_CIMERG (based on OI-GWR) was much higher than that of DS_OIMERG (based on GWR), indicating that the downscaling results for satellite precipitation data depend to a large extent on the quality of the initial precipitation product.
(3) Residual correction is a key step in global regression downscaling methods, such as ER and MLR. However, in the GWR method, residual correction tends to transfer the errors of the original satellite precipitation data to the final downscaling results. The statistical evaluation metrics of the GWR results obtained after residual correction were worse than those before residual correction, indicating that residual correction is unnecessary in GWR-based downscaling.
By improving the precision of the original IMERG satellite precipitation products and then downscaling the products thus obtained, the two-step OI-GWR method can serve as a more effective approach for the generation of regional precipitation datasets. Further efforts will be needed to extend the application of OI-GWR to the daily or hourly scale.

Conclusions
In this study, a two-step merging and downscaling method (OI-GWR) was proposed and used to downscale original IMERG precipitation data for the Tianshan Mountains from a resolution of~10 to 1 km. First, the original IMERG precipitation data were merged with observed precipitation data using the OI method (https://github.com/tgq14/GWR-OI). Then, downscaling was performed using the GWR method, with the corrected CIMERG precipitation data as the initial values and the NDVI as an auxiliary variable. The performance of OI-GWR was assessed based on rain gauge observations, and the results were compared with those of other downscaling methods. The main conclusions include the following: (1) The OI-based merging of the OIMERG data and the observed precipitation data greatly improved the precision of the original satellite precipitation product. After OI, the CC, RMSE, and MAE values for the OIMERG data were increased from 0.516, 28.83 mm, and 19.73 mm to 0.616, 26.56 mm, and 17.59 mm, respectively.
(2) An assessment of various downscaled datasets showed that the precision of DS_CIMERG (based on OI-GWR) was much higher than that of DS_OIMERG (based on GWR), indicating that the downscaling results for satellite precipitation data depend to a large extent on the quality of the initial precipitation product.
(3) Residual correction is a key step in global regression downscaling methods, such as ER and MLR. However, in the GWR method, residual correction tends to transfer the errors of the original satellite precipitation data to the final downscaling results. The statistical evaluation metrics of the GWR results obtained after residual correction were worse than those before residual correction, indicating that residual correction is unnecessary in GWR-based downscaling.
By improving the precision of the original IMERG satellite precipitation products and then downscaling the products thus obtained, the two-step OI-GWR method can serve as a more effective approach for the generation of regional precipitation datasets. Further efforts will be needed to extend the application of OI-GWR to the daily or hourly scale.