2.2.1. Downscaling by GWR
The geographically weighted regression (GWR), which can construct the relationship between the dependent variable and explanatory variables, is a local regression model [
15,
36]. The GWR method can be written as:
where
and
are the dependent variable and the
k-th explanatory variable at location
, respectively; there are
m explanatory variables;
,
, and
are the intercept, regression coefficient, and random error at location
, respectively.
The intercept
and regression coefficient
are estimated by minimizing a weighted residual sum of squares and are shown as follows:
where
denotes the total number of samples at location
,
is the regression coefficient vector of GWR at location
, and
is the geographic weight of the
j-th sample at location
.
Formula (2) is solved by the weighted least square method, and the regression coefficient
at location
is estimated as the following matrix form:
where
is a diagonal matrix denoting the spatial weight of each sample at location
,
denotes the matrix of explanatory variables with a column of 1s for the intercept, and
denotes the dependent variable vector.
To generate high-resolution spatial precipitation estimations, the original SPPs (GMP, TRMM, CMORPH, and PERSIANN-CDR) are downscaled to 0.05° based on the constructed relationship between precipitation and explanatory variables (NVDI, elevation, slope, longitude, latitude) by the GWR model. The relationship at the original low resolution can be used to predict precipitation with the explanatory variables at a high resolution. Due to the fact that relationship at a daily scale is far less statistically significant than that at monthly scales [
37], this study constructs the relationship between precipitation and explanatory variables at a monthly scale by the GWR model, and then disaggregates the downscaled monthly result into daily precipitation to generate the downscaled daily SPP. The specific steps are shown as follows:
Step (1): Resample the explanatory variables (NVDI, elevation, slope, longitude, latitude) from resolution to and resolutions using a bilinear interpolation. The monthly NVDI of resolution is marked as , and the resampled NVDIs are marked as and , respectively. The resolution elevation, slope, longitude, and latitude data are marked as , , , and , respectively, and the resampled and 0.25 resolution elevation, slope, longitude, and latitude data are marked as , , , , , , , and , respectively.
Step (2): Construct the relationship between monthly precipitation (accumulate original satellite daily precipitation) and explanatory variables (the resampled NDVI, elevation, slope, longitude, and latitude) by the GWR. The original
resolution satellite daily precipitation (TRMM, CMORPH, and PERSIANN-CDR) is marked as
,
, and
. The original
resolution satellite daily precipitation (GPM) is marked as
. The accumulated
resolution satellite monthly precipitation (TRMM, CMORPH, and PERSIANN-CDR) is marked as
,
, and
. The accumulated
resolution satellite monthly precipitation (GPM) is marked as
. The constructed relationship between the satellite precipitation data
(representing
,
, and
) and the explanatory variables (
,
,
,
, and
) is shown in Equation (4). The constructed relationship between the satellite precipitation
(representing
) and the explanatory variables (
,
,
,
,
) is shown in Equation (5):
where
and
are the intercepts;
,
,
,
,
,
,
,
,
, and
are the regression coefficients; and
and
are residuals of the two GWR models.
Step (3): Resample the regression coefficients (, , , , , , , , , , and ) to obtain the resolution regression coefficients (, , , , , and ) by the bilinear interpolation method, and resample the residuals (, ) to obtain the resolution residuals () by the ordinary kriging interpolation method.
Step (4): Estimate monthly precipitation (
) by using the resampled
resolution regression coefficients (
,
,
,
,
, and
), and the resampled
resolution residuals (
,
) are shown in Equation (6).
Step (5): Disaggregate the downscaled satellite monthly precipitation into daily precipitation according to a proportional fraction. The fraction of the
and
resolution daily precipitation to the
and
resolution monthly precipitation is denoted as
and
, respectively.
and
are calculated by Equation (7).
Next, the
and
resolution fractions (
and
) are resampled to obtain the
resolution fraction (
) by a bilinear interpolation method. Then, Equation (8) is used to obtain the
resolution daily precipitation.
represents the resolution daily precipitation (the downscaled TRMM, CMORPH, PERSIANN-CDR, and GPM).
2.2.3. Fusion by LSTM Network
A long short-term memory (LSTM) network, which is composed of an input layer, one or more memory cells, and an output layer, is well suited to study time series data [
41]. The main structure of an LSTM network contains so-called memory cell in the hidden layer. The memory cell controls the communication of information within the memory cells through three gates (i.e., input gate
, forget gate
, and output gate
). Each gate controls the information to participate in the update of the memory state and selectively retains or discards information. The key equation of the LSTM network is shown as follows:
where “
” represents element-wise multiplication,
represents the input vector at time t, each
represents the adjustable weight of the network,
represents the adjustable bias vector,
represents the internal hidden state,
represents the cell state of the memory cell, and
represents the activation function.
As shown in
Figure 4, this study uses the LSTM network for improving the estimation accuracy of spatial precipitation by exploiting the spatiotemporal correlation pattern between multisatellite precipitation products and rain gauges. The LSTM network extracts the spatiotemporal correlation patterns through a series of memory cells and merges those extracted patterns to generate high-quality daily precipitation estimates. The key to the LSTM-based fusion to realize long-term memory lies in keeping the multiple precipitation information of each time step in the memory cells. For a certain time step, the multiple precipitation information at the past moment will be retained in the memory cells, and provide a reference for the merged precipitation at the current moment. In this fusion model, the LSTM network includes multiple memory cells in the hidden layers. The output neurons (the extracted spatiotemporal patterns of multisatellite precipitation) at the last time step from the last hidden layer of the LSTM network are merged to a single output neuron (the merged precipitation) through the fully connected network.
In this paper, we trained the LSTM network with four SPPs (TRMM, CMORPH, PERSIANN-CDR, and GMP) as input and gauge observations as output. The specific hyperparameters (i.e., number of layers, number of neurons, learning rate, and epoch) of this fusion model were the optimal choices based on the data size and multiple experiments. The epoch and learning rate of the LSTM network were 200 and 0.01, respectively. The number of hidden layers was 3. The number of neurons in each hidden layer were 128, 128, and 64, respectively.