1. Introduction
Precipitation, one of the most important climatic factors, is a vital part of the hydrologic cycle, affecting energy transfer and maintaining biosphere functions [
1,
2,
3]. It is the focus of hydrology, agriculture, ecology, and environmental science, as well as other related disciplines [
4,
5,
6]. Controlling and evaluating the spatial and temporal distribution of precipitation is important in many fields, such as basin and watershed management, soil and water conservation, climate change assessment, agroclimatic or ecological regionalization, ecological environment construction, and prediction and prevention of extreme weather disasters [
3,
7,
8].
The main methods for obtaining precipitation data include ground-based meteorological measurements and spaceborne radar observations [
9,
10]. Spaceborne radar data have low spatial resolution and large uncertainties, which could lead to significant errors in precipitation distribution prediction [
11]. Ground-based meteorological data are typically used for spatial interpolation of rainfall distribution, due to a longer time series and smaller errors [
12]. However, as the number and distribution of meteorological stations is limited and inconsistent, it is still a huge challenge to obtain high-accuracy, high-resolution data on the distribution of precipitation [
10,
13].
Precipitation is a comprehensive reflection of the interactions among the various components of the climate system [
2,
14]. It is influenced by atmospheric circulation, local topography, and land cover, and is closely related to many physical geographic factors, such as altitude, slope, aspect, solar radiation, vegetation type, distance to mountains or seas, lakes, and river systems [
15,
16]. Due to limitations of the interpolation methods algorithm, a lot of studies on this topic have only used elevation (DEM) as an auxiliary variable, combined with co-Kriging or thin plate spline (TPS), in order to predict the distribution of precipitation [
6,
11,
17,
18]. Many studies have shown that, by considering additional natural geographical features in the interpolation process, rainfall variability can be explained more effectively [
5,
8,
19]. Interpolation methods that use a variety of auxiliary variables include multiple linear regression (MLR), geographically weighted regression (GWR), artificial neural networks (ANN), regression Kriging (RK), Bayesian maximum entropy (BME) interpolation,
etc. [
3,
5,
8,
15,
19,
20].
Regression Kriging, first coined by [
21], is mathematically equivalent to “universal Kriging” (UK) or “Kriging with external drift” (KED) [
22]. It is a spatial prediction technique, which is combined with a regression forecast of auxiliary variables and Kriging interpolation of the regression residuals. It can combine different regression models to generate many combined methods [
20], among which the multiple linear regression Kriging (MLRK) and geographically weighted regression Kriging (GWRK) are the most commonly used. The regression process of MLRK fits with the global trend of the target variables across the study area under stable conditions between spatial variables [
5,
15]. The regression process of GWRK fits the local trend around the predicting points, and can adapt to a non-stationary relationship between the spatial variables, leading to a better explanation of the spatial variation of target variables [
5,
23]. Both of these methods have been widely used in Earth science and environmental science, especially in studies of the spatial distribution of soil properties [
23,
24,
25]. There are currently some studies that analyze precipitation interpolation on different temporal and spatial scales using these two methods [
5,
19,
26]. However, more research is needed on the use of MLRK and GWRK to evaluate the performances of the two methods and obtain high spatial accuracy precipitation maps.
The Loess Plateau was selected as a study area as there is a shortage of water and severe soil erosion. The region belongs to both semi-humid and semi-arid areas, where vegetation growth is limited by rainfall, and ravines and gullies cover the underlying surface. The relationship between different physical geographical features is significantly non-stationary. Based on the above characteristics, the study area is very suitable for precipitation interpolation experiments. The objectives of this study were as follows: (1) to assess the performance of MLRK and GWRK in the interpolation processes; and (2) to obtain a highly accurate distribution map of average annual precipitation with a 500 m resolution in the Loess Plateau.
5. Discussion
Based on background knowledge of the physical geography of the study area, the geographic factors that are closely related to local precipitation are selected as auxiliary variables. This process can help improve the accuracy of the RK model. The results of this study show that the average annual precipitation in the Loess Plateau gradually decreased from southeast to northwest under the influence of the monsoon and the sea (
Figure 9). Affected by complex mountainous terrain, precipitation changed greatly in the Taihang Mountains and Luliang Mountain region in the east of the Loess Plateau. Precipitation in the narrow region windward and to the east of the Taihang Mountains was significantly higher than in the leeward area of the western mountains and the plain area on the east side of the mountains. Due to the Tibetan Plateau, the elevation of the western region of the Loess Plateau gradually increases, and the rainfall increases correspondingly. The Loess Plateau belongs to semi-humid and semi-arid areas, where water availability is an important limiting factor for vegetation growth. Vegetation typically thrives in places where rainfall is abundant. Moreover, if the precipitation changes significantly, the underlying natural geographical factors would also show a marked change. For example, in this study, the reason that station 373 had such a strong influence on observations is that it lies on the Huashan Mountain, where altitude is much higher than the surrounding areas.
By exploring the characteristics of the annual average rainfall data, a suitable interpolation method can be chosen for precipitation predictions. Geostatistical interpolation and non-geostatistical interpolation methods both involve various assumptions. In order to obtain a more accurate prediction, the data need to meet the assumed requirement [
22,
32]. For example, in order to make the distribution of data meet the requirements of the normality assumption, precipitation data were typically adjusted to a logarithmic scale. Of course, it is also possible to use an interpolation method that does not require a normality requirement calculation.
In choosing a suitable interpolation method, it is import to consider whether there is a need for precipitation data to fit trends in the interpolation process, and then further consider whether to fit the global trend or local trends according to the stability of the underlying surface. In this study, MLRK is a global trend-fitting method, which is suitable for a relatively homogeneous underlying surface, and GWRK is a local area trend-fitting method, which is suitable for relatively complex surface types [
23,
33]. Previous studies suggest that accuracy of the local trend-fitting method is generally higher than the overall trend-fitting method [
5,
19,
26]; however, this is not always true. With an increase in the size of the study area and the complexity of the underlying surface, the relationship between rainfall and auxiliary variables will become more unstable. When the underlying surface changes greatly but precipitation is relatively stable, the variability in predicted rainfall may be overestimated in local trend fitting, thus reducing the accuracy of forecasts. This type of situation was recognized in this study. The results showed that the entire GWRK error was slightly larger than the MLRK error. In summary, it is difficult to choose the right interpolation method when starting a precipitation interpolation. The advisable way to obtain better interpolation results is by selecting several available interpolation methods, comparing the error after interpolation, and then selecting the method with the minimum error.
The interpolation result of annual average rainfall is affected by several uncertainty factors: (1) an uneven distribution and limited number of stations leads to the underestimation of trends and rainfall variability in these areas; (2) since the relationship between rainfall data and the auxiliary variable is unstable, a more accurate model cannot be precisely established, decreasing the interpolation result accuracy; (3) rainfall station data in most studies are limited to within the study area. A significant prediction error in interpolation appears at the boundary of the study area, known as “edge effects.” These effects could be mitigated by taking into account the precipitation data of stations just outside the border [
15]. In this study, the precipitation data of meteorological stations located at the 100 km peripheral buffer of Loess Plateau was added in order to improve the prediction results of the study area boundary.
Measuring the interpolation error is an important process in interpolation method selection and analysis of interpolation results. In this study, two common methods were used to evaluate the results of interpolation: one using the predictive data set itself with cross-validation; the other using a verification data set to directly calculate ME, RMSE, and other indexes. When using the validation data set, the validation points should cover a broad range of land use types. In this study, the meteorological data of the validation data set is only from hydrological stations at the edge of a river or gully with lush vegetation coverage. This validation data set is therefore lacking in data from other land use types. Therefore, this study can clearly show that MLRK prediction performance was slightly better than GWRK in the area around the river, but in regions with other types of land use, it is not possible to determine which of the two models is better.
6. Conclusions
One of the main objectives of this study was to generate a highly accurate distribution map of average annual precipitation with a 500 m spatial resolution in the Loess Plateau for the period of 1980–2010. Alternative distribution maps of precipitation were interpolated by MLRK and GWRK methods and all showed a high accuracy. There were large disparities, however, in two regression Kriging processes using different methods: the variance explanation of the GWRK regression model was higher than that of MLRK, but the contrary is true of the Kriging process. The interpolation maps using MLRK and GWRK both captured many details of spatial distribution influenced by predictors. Although the GWRK is based on the spatial non-stationary assumption and the map predicted by GWRK did show greater spatial variation, the final validation analysis revealed that MLRK yielded higher model efficiency than GWRK, with small differences. This is in contrast to other previous precipitation interpolation studies. The conclusions can be summarized as follows: (1) both MLRK and GWRK are able to incorporate multiple auxiliary environmental factors into the modelling process and obtain a highly accurate precipitation distribution map; (2) unlike other studies of precipitation prediction, MLRK is shown to be a better method for precipitation interpolation when the underlying surface is complex. In future, greater effort should be made to consider more physical geographic factors related to or impacted by rainfall as auxiliary environmental variables, which could possess a higher resolution, in order to get a higher accuracy of precipitation distribution maps. More studies should investigate and identify the standards and principles to select and validate the interpolation methods further.