2.1. The Gauge-Elevation-Consistency (GEC) Rule for Assessment
The consistency rule is a general principle in various researches and public management, such as hydrology stationarity assumption. The consistency rule in this study is defined as that the rainfall in a closer region should have the similar rainfall characteristics. The definition has two contents: (1) the satellite precipitation should have the same value as the ground gauged precipitation in the same grid cells as ground gauges in; (2) the satellite precipitation should have the similar value as the ground gauged precipitation in the grid cells closer to the ground gauges.
The mathematical expression of the consistency rule in this study is expressed as follows.
where,
P means the rainfall and
L means the location of the gauges or satellite grid cells. The subscript
s stands for gauge station, while
g and
u represent the satellite grids with gauges (gauged gird cells) and without gauges (ungauged grid cells).
D is a close region (see detail expression below).
The Equation (1) is the first content of the definition, which has been used numerously in comparing the difference between satellite precipitation and ground precipitation. The Equation (2) is the second content of the definition, which just extends the Equation (1) from the grid cells with gauges to the cells without gauges. The Equation (3) is the bridge of ground gauges, gauged and ungauged satellite cells. The Equation (3) has also been used frequently, like the Thiessen polygon for areal rainfall interpolation.
The closer region (i.e., closer grid cells) in this study is the areas which have the same elevation, the same slope aspect, and similar location as the ground gauges. The same elevation refers to the relationship of rainfall and altitude; the same slope aspect refers to relationship of rainfall and vapor sources direction; the closer location is to limit the ungauged grid cells in a tolerant distance from gauged grid cells. In the tolerant range, all ungauged grid cells’ ground precipitation is the same as the closest gauge’s precipitation. Taken the definition above, the error assessment of satellite precipitation can be expanded from grid cells with gauge to grid cells without gauge.
For grid cells with gauge measurements, the corresponding grid values of the satellite rainfall are compared with ground measured precipitation according to the Equation (1). The assessment can be quantitatively expressed by the statistic parameters of Mean Bias and Root Mean Square Deviation as follows.
where
i is the gauged grid cell rank number and
M is the total number of ground gauges,
PS is the satellite rainfall,
PG is the ground gauge measurement.
For grid cells without gauge measurements, the corresponding grid values of the satellite rainfall will be compared with the ground measured rainfall according to Equations (2) and (3). The assessment is quantitatively expressed by a newly proposed statistic criterion, Consistency Rate (CR), as follow.
where
j is the ungauged grid cell rank number.
D is the Rainfall-Elevation Mask (REM) for qualification the ungauged cells precipitation,
i.e., the exact closer region in Equation (3).
n is the number of satellite ungauged grid cells within the REM having comparable rainfall as gauges, and
N is the total number of ungauged grid cells.
The REM can be derived based on the rainfall measurement and elevation of the gauges. Assuming there are
M (
M ≥ 2) ground rainfall gauges in the same slope of a mountain. Sort the
M gauges in ascending (or descending) order of elevation. Set every
l (
l = 2, …,
M) sequential gauges as a group
Gk (
k = 1, …,
M −
l + 1). For each
Gk, there are a lowest and highest elevation (
), as well as a minimum and maximum gauge rainfall (
). The rectangle space of (
) forms a closer region,
Dk. All the closer regions consist of the whole rainfall-elevation mask
D. The mask physically denotes the possible or reasonable rainfall range for each elevation.
Figure 1 shows an example of deriving the rainfall-elevation mask (
l = 3,
M = 5).
Figure 1.
The derivation of rainfall-elevation mask (l = 3, M = 5). The red solid lines are the up and low limit of rainfall and the black dashed lines are the up and low limit of elevation of each sub-mask. The whole pink region is the final rainfall-elevation mask (REM).
Figure 1.
The derivation of rainfall-elevation mask (l = 3, M = 5). The red solid lines are the up and low limit of rainfall and the black dashed lines are the up and low limit of elevation of each sub-mask. The whole pink region is the final rainfall-elevation mask (REM).
A larger CR means that a larger proportion of satellite grid cells have comparable rainfall as gauge measurements within the same elevation range. In other words, satellite precipitation tends to have higher accuracy. Therefore, the CR value can quantifiably measure the consistency between the gauge and satellite rainfall over grid cells without gauges.
2.2. The Location-Elevation-TMPA (LET) Correlation for Improvement
Considering the significant influence of topographic and geographical features on rainfall, topography and geography information is involved in improving the satellite rainfall in this study. The gauge data are assumed to be the actual rainfall values. For the grid cells with gauges, the relationships between the actual rainfall and topographic/geographic information as well as TMPA rainfall were investigated. Compared with the original TMPA rainfall, the final predicted rainfall incorporates topographic and geographic information. In other words, TMPA rainfall was corrected by the topographic and geographic information.
Given the unclear influence of topographic and geographic factors on rainfall, Genetic Programming (GP) was used as a tool to mine the relationship between rainfall and related factors. In the present study, the real rainfall (gauge measurements) was used as the target of GP, and the inputs include geographical location (north latitude and east longitude), elevation, and TMPA rainfall. The method is expressed below:
where
PA is the actual rainfall,
X and
Y are the locations (longitude and latitude, respectively),
E is the elevation, and
PS is the rainfall from TMPA. Elevation is considered as the main variable that influences the spatial distribution of precipitation in mountainous areas, geography information is regarded as the factor of regional and local climate patterns, and the TMPA data is involved to make full use of the satellite information.
There are three steps for mining the robust and explicit formula of Equation (9).
Step (1), testing calibration. Exclude part of ground gauges and then put the remaining gauges into the mining dataset together with the same grid cells of satellite for data mining. The information of the dataset includes location of the gauges, elevation and satellite precipitation in the same location as gauges. The problem of GP is as follows.
where
R2 is the coefficient of correlation between gauge measurements (
PG) and modeled actual rainfall (
PA).
CV(RMSD) is coefficient of variation of the
RMSD, which is calculated by normalizing
RMSD by the mean value of the measurements. The target of the GP problem is to minimize the
R2 and
CV(RMSD) between actual rainfall from Equation (9) and gauge measurements.
Step (2), cross-validation. Predict the satellite precipitation in the grid cells excluded gauges located by the mined explicit formula of Equation (9) and assess the fitness of the mined explicit formula.
Repeat step (1) and (2) until all the gauges are included and excluded at least once to validate the effectiveness of LET method.
Step (3), final calibration. If the cross-validation process indicates that LET is valid, put all gauges into mining dataset together with the same grid cells of satellite for data mining. The mind formula is the final explicit Equation (9), which can be used to predict and adjust satellite precipitation in the grid cells both with and without ground gauges.