4.1. Results Analysis
In this study, 31 ground stations were selected as the training stations for the spatial interpolation of precipitation, and the precipitation of 10 stations was predicted and compared with the measured precipitation. The calculation results of the test station error index are shown in
Table 3. The precipitation of the test station is shown in
Figure 7 and
Figure 8. The support vector hybrid interpolation method and linear regression hybrid interpolation method in
Figure 7 and
Figure 8 are the smallest set of parameter interpolation results from the test set
RMSE. The SVM and the linear regression equation correspond to their respective hybrid interpolation methods. At the same time, the spatial distribution of multi-year average annual precipitation with a spatial resolution of 0.25° × 0.25° is obtained through interpolation in the Three Gorges Region (
Figure 9).
Overall, the distribution of multi-year average annual precipitation in the Three Gorges Region varied greatly, and the low elevation area above the west section of Fengjie has less precipitation; while the canyon alpine area below the eastern section of Fengjie has more precipitation; and the rainy center is located in the Yanshui area.
Comparison of auxiliary variables: Taking the influence factors of precipitation such as longitude, latitude, slope, exposure and elevation of sites into account, this study chose latitude and elevation as auxiliary variables of precipitation interpolation in accordance with related analysis results shown in
Table 1. Precipitation data offered by the TRMM satellite could reflect characteristics of the spatial distribution of precipitation, so they were also included as auxiliary variables. We selected four combinations of auxiliary variables and multi-year average annual precipitation of ground stations to establish a linear regression equation and six combinations to establish an SVM model. Due to the representativeness and independence of characteristic information, each combination should have no more than three auxiliary variables of different types. The predictive results of the linear regression equation (from the fifteenth to the eighteenth line in
Table 3) show that the E-T
fs and E
10-T
fs combined interpolation accuracy with satellite precipitation data is significantly higher than that of the Y-E and E
10 without satellite precipitation data. The predictive results of the SVM model (from the first to the sixth line in
Table 3) show that the SVM model based on latitude and elevation information only (Y-E, Y-E
10) has poor interpolation accuracy, while adding the satellite precipitation data as the auxiliary variable greatly improves the interpolation accuracy, and the three combinations of Y-E-T, Y-E-T
fs and Y-E
10-T obtain better interpolation results than the inverse distance weighting and ordinary kriging method. The terrain of the Three Gorges Region is complex and varied, and the precipitation is obviously affected by latitude and elevation. However, using only the latitude, elevation information and precipitation to establish a linear regression equation or SVM model cannot provide high interpolation accuracy. The TRMM satellite precipitation data, despite a certain deviation, has good spatial distribution continuity and better reflects the basin precipitation trend. It provides an effective supplement to the latitude and elevation information, so adding the satellite precipitation data as auxiliary information makes the interpolation effect better.
Comparison of the normalized range of the SVM model: Each input variable has a different dimension and range from the target value. For example, the site latitude only changes by 2.3°, and the maximum elevation range within 10 km of the station changes 2259 m. Normalization of input variables and target data can prevent the variables of a large dynamic range from drowning the variables of a small dynamic range, making them have the same effect. The data normalization range has a great influence on the prediction accuracy of the SVM model. In this study, we selected seven normalization ranges of (0, 50), (0, 20), (0, 10), (0, 7), (0, 4), (0, 1), (0.1, 0.9) for comparison. The best and worst normalization ranges of the
RMSE value in the SVM model are shown in
Table 4. It can be seen from
Figure 10 that the SVM model is more stable when the normalization range is (0, 10), (0, 7) and (0, 4). When the normalization range is (0, 50), (0, 20), (0, 1) and (0.1, 0.9), the SVM model is more sensitive, and the probability of extreme values increases. Based on the results of 14 sets, the best
RMSE has the highest probability to appear at (0, 50), reaching 50%; the worst
RMSE has the highest probability to appear at (0.1, 0.9), reaching 42.86%. Therefore, using the experiment to select a suitable normalization range helps the model achieve better predictive results.
Comparison of the linear regression hybrid interpolation method and the SVM hybrid interpolation method: As shown in
Table 3, hybrid interpolation methods show the highest precision where their precipitation estimations are close to the actual results. This is mainly because, by hybrid interpolation methods, the supplementary information of precipitation, including geographical positions, topographic features and satellite precipitation data of a spatially-continuous distribution, was considered comprehensively. In addition, linear regression equation and SVM residuals were further modified through inverse distance weighting and ordinary kriging interpolation methods, improving the interpolation accuracy of the linear regression equation and SVM.
Table 5 shows that, after the regression residuals and the SVM residuals are respectively interpolated, the reduction degree of linear regression interpolation
RMSE is significantly higher than that of SVM
RMSE. To explain this, first, the SVM model overall has better direct predictive results than the linear regression equation; thus, it has less room for improvement. Second, the linear regression equation is simpler and more stable than the SVM. The regression residuals retain more precipitation feature information than the SVM residuals. Although the linear regression hybrid interpolation method has a stronger residual correcting effect than the SVM hybrid interpolation method, the SVM hybrid interpolation method obtains better interpolation results than the linear regression hybrid interpolation method, because the linear regression equation posits a linear relationship between the interpolated object and auxiliary variable, while the SVM model better fits the complex nonlinear relationship between the interpolated object and auxiliary variable.
The SVM model Y-E-T and Y-E-T
fs combined
RMSE value is about 120 mm (from the second to the third line in
Table 3), and as shown in
Figure 10a,b, after correcting the residuals, the interpolation accuracy improves to a certain degree. The SVM model Y-E
10-T combined
RMSE value is about 100 mm (from the fifth line in
Table 3); the Y-E-T
fs combined
RMSE value is about 140 mm (from the sixth line in
Table 3); as shown in
Figure 10c,d, the interpolation precision is reduced to a certain degree after the residuals are corrected. However, when the normalization range is (0.1, 0.9), while there is a decrease of the SVM model fitting accuracy, the SVM hybrid interpolation method has a certain degree of improvement in interpolation accuracy. In this study, we believe that, whether the SVM hybrid interpolation method can further improve the SVM model interpolation accuracy to a certain extent depends on the SVM model fitting accuracy. Choosing a suitable fitting accuracy, though difficult, is the key to ensuring the prediction accuracy of the SVM hybrid interpolation method.
Comparison of single station prediction results (
Figure 7 and
Figure 8) showed that the regression hybrid interpolation method better predicted the precipitation of the Yanshui, Hexing, Tangfang and Xunleping stations than the multi-element linear regression method. The support vector machine hybrid interpolation method better predicted the precipitation of the Xunleping station than the SVM. However, the regression hybrid interpolation method and the SVM hybrid interpolation method failed to successfully predict the precipitation of Shuiyuesi station, for two reasons. First, the Shuiyuesi station is located in the area where there are too few rainfall stations. Second, the TRMM data have a certain degree of deviation, whose overall effect makes the station precipitation forecast results significantly larger. He et al. (2005) pointed out that the most significant variation of the regional precipitation interpolation comes from the number of meteorological stations and the spatial interpolation method [
1]. The establishment of the regional precipitation and auxiliary variables’ linear regression equation and the SVM can minimize the dependence of the interpolation method on the density of meteorological stations, but it still requires a basic number of stations.
The interpolation results of ground site precipitation and TRMM satellite data were compared: the average precipitation (127 grids, 0.25° × 0.25°) of the Three Gorges Region was calculated by using the four interpolation methods of IDW, OK, LRI and SVMRI. Using the interpolated precipitation, we calculated the relative error of the corresponding TRMM data (
Table 6). The distribution of grid relative error is shown in
Figure 11.
Figure 11 shows that the TRMM data relative error is negative at the rainy center of the basin, while other areas are basically positive.
Table 6 shows that the basin multi-year average annual precipitation calculated by the TRMM data and LRI interpolation has the least relative error, and the SVMRI relative error is slightly smaller than the IDW and OK. This is because the LRI and SVMRI use the TRMM data as the auxiliary variable in the interpolation process, such that the TRMM data have a certain degree of contribution to the final interpolation results. However, the linear regression equation has a greater degree of dependency on the TRMM data than the SVM. The SVM accepts the TRMM data information while maintaining its independence, taking into account that the TRMM data linear regression and the linear regression hybrid interpolation method are not suitable for evaluating TRMM data.