Spatial Sampling Strategies for the Effect of Interpolation Accuracy

Spatial interpolation methods are widely used in various fields and have been studied by several scholars with one or a few specific sampling datasets that do not reflect the complexity of the spatial characteristics and lead to conclusions that cannot be widely applied. In this paper, three factors that affect the accuracy of interpolation have been considered, i.e., sampling density, sampling mode, and sampling location. We studied the inverse distance weighted (IDW), regular spline (RS), and ordinary kriging (OK) interpolation methods using 162 DEM datasets considering six sampling densities, nine terrain complexities, and three sampling modes. The experimental results show that, in selective sampling and combined sampling, the maximum absolute errors of interpolation methods rapidly increase and the estimated values are overestimated. In regular-grid sampling, the RS method has the highest interpolation accuracy, and IDW has the lowest interpolation accuracy. However, in both selective and combined sampling, the accuracy of the IDW method is significantly improved and the RS method performs worse. The OK method does not significantly change between the three sampling modes. The following conclusion can be obtained from the above analysis: the combined sampling mode is recommended for sampling, and more sampling points should be added in the ridges, OPEN ACCESS ISPRS Int. J. Geo-Inf. 2015, 4 2743 valleys, and other complex terrain. The IDW method should not be used in the regular-grid sampling mode, but it has good performance in the selective sampling mode and combined sampling mode. However, the RS method shows the opposite phenomenon. The sampling dataset should be analyzed before using the OK method, which can select suitable models based on the analysis results of the sampling dataset.


Introduction
Spatial interpolation is the procedure of deriving the characteristic values of unknown data at specified points and the characteristics of data distribution based on the known data at specific points in the same area, which plays a significant role in spatial analysis [1].
More than twenty spatial interpolation methods have been used in different fields.According to the mathematical mechanism of interpolation, these interpolation methods can be classified as non-geostatistical methods and geostatistical methods.Non-geostatistical methods include the nearest neighbor method, inverse distance weighted method, local polynomial method, and regular spline method.The kriging method is the most common geostatistical method.Each interpolation method has different factors that affect the interpolation accuracy, and all affecting factors should be considered when interpolation methods are used.For example, the variance and variograms should be analyzed before using the kriging interpolation method.
Spatial interpolation methods have been applied in various fields.In the research of temperature interpolation, Wu et al. proposed a method for setting parameters and selecting models, in which the sampling points of observation stations were used and "one best method" was selected from several interpolation methods [2].Phillips et al. selected the IDW and kriging methods for temperature simulation, and an optimal interpolation method was selected by comparing the absolute errors and root mean squared errors of IDW and kriging [3][4][5][6].Wang et al. evaluated eight interpolation methods for research on heavy metals; the nearest neighbor interpolation method had the worst performance, and the multifractal kriging interpolation method had the best performance.However, different heavy metals had significantly different spatial characteristics, which affected the accuracy of the interpolation [7].Li et al. proposed the fractal interpolation method when studying the distribution of copper.The fractal interpolation method was suitable when the sampling points were distributed unevenly and had significant correlation [8].In terrain analysis, Liu et al. analyzed the spatial variability of elevation, and the results showed that the OK method was the best interpolation method [9].Wang used the IDW, RS, TS, and kriging interpolation methods to analyze aerial survey points, and the TS interpolation method provided the best interpolation with the highest interpolation accuracy [10].In studying soil nutrients, Chen et al. used three interpolation methods (i.e., kriging, spline, and IDW) for interpolation of N, P, K and PH sampling points, and the spline interpolation method performed the worst [11].Liu et al. proposed a method that combined ensemble learning with ancillary environmental information for improved interpolation of soil properties because the ensemble learning model can describe more locally-detailed information and more accurate spatial patterns for soil potassium content than the other methods (i.e., kriging and IDW) [12].
From the above review, most scholars have studied interpolation methods by (1) comparing a few interpolation methods for a particular sampling dataset to determine the "best" interpolation method for a particular area of application, and (2) applying some assistant methods to improve one interpolation method for better accuracy.Both aspects can help obtain a good result for the specified sampling points.However, it is inappropriate to apply the conclusions to other sampling points because the sampling points used in a particular study may lack comprehensive representatives.In the case of terrain analysis, for example, a set of specified elevation sampling points can reflect only one region, which has a specific complexity, and the specified elevation sampling points differ from other sampling points with different sampling modes and sampling densities.
In this study, the influence of the terrain complexity, sampling mode, and sampling density on the accuracy of the interpolation results derived from different interpolation methods are investigated.The first section introduces the function, situation, problems, and the objective of this paper; the second section introduces the interpolation methods and the affecting factors selected in this paper; the third section gives the details of the experimental procedures; the fourth section provides the results of the experiment; the fifth section presents a discussion of the results; and the sixth chapter presents the conclusions of this study.

Spatial Interpolation Methods
Considering the variety and representativeness of interpolation methods, three interpolation methods are selected in this paper: two are classified as non-geostatistical methods (IDW and RS) and one is classified as a geostatistical method (OK).These three methods are commonly used and are representative of interpolation methods.The following provides a brief introduction to these methods.
(1) IDW: inverse distance weighting or weighted method is the simplest interpolation method and estimates the values at unsampled points using the distances to and values of nearby sampled points.IDW is based on the principle: closer things are similar.The principle assumes that two close sample points have similar attributes, and further sample points have less similar attributes.In this method, the value of a cell is the weighted average of the values of sample points nearby.A point closer to the cell in question carries a larger weight.IDW is a simple and effective interpolation method.The computation speed is relatively fast.In addition to the weighted distances, the power and search radius are also important factors affecting an IDW interpolation result [13].The estimated values can be determined by the following equation: where λi is the inverse distance weight, Z0 is the predicted value, and Zi is the measured value.
(2) RS: spline interpolation is based on the following principle: the interpolation interval is divided into small subintervals.Each of these subintervals is interpolated using the third-degree polynomial.The polynomial coefficients are chosen to satisfy certain conditions (these conditions depend on the interpolation method).General requirements are function continuity and passing through all given points.There are additional possible requirements: function linearity between nodes and continuity of higher derivatives.The main advantages of spline interpolation are its stability and calculation implicitly.Sets of linear equations, which are solved to construct splines, are well-conditioned; therefore, the polynomial coefficients are calculated precisely [14].The equation of the regular spline interpolation is as follows: where α1, α2, α3 are coefficients of the equations, N is the number of sampling points, λi is the weights, and R(rj) is the spline function used to modify the interpolation results.
(3) OK: used in statistics, originally in geostatistics.Kriging is a method of interpolation for which the interpolated values are modeled by a Gaussian process governed by prior covariances, as opposed to a piecewise-polynomial spline chosen to optimize the smoothness of the fitted values.Under suitable assumptions of the priors, kriging gives the best linear, unbiased prediction of intermediate values.
Ordinary kriging is the most widely used kriging method.OK estimates a value at a point of a region for which a variogram is known using data in the neighborhood of the estimation location.OK assumes that the expected value of the interpolation field is an unknown constant [15].The estimated values can be determined by the following equation: where Z0 is the predicted value, z(xi) is the measured value, wi(x0) is the weights reflecting the structural "proximity" of samples to the estimation location (x0), and z(x0) is the mathematical expectation of the sampling points.
The spatial characteristics of sampling points play a significant role in spatial interpolation and are determined by three factors: (1) where to sample; (2) what mode to sample (e.g., systematic or adaptive); and (3) how many points to sample (i.e., density).Samplings at different locations have different terrain situations, which represent different terrain complexities.The mode of sampling represents the way in which the sampling points are spatially distributed.For example, in the systematic (regular-grid) sampling mode, the sampling points are evenly distributed, whereas under the adaptive (selective) sampling mode, the sampling points are determined by the feature points of the terrain.The density of sampling points, i.e., the number of sampling points in a given region, has a significant effect on the interpolation accuracy.In this paper, the accuracy of the results generated by different spatial interpolation methods is studied with respect to datasets with different levels of terrain complexity, sampling density, and sampling modes.

Data Preparation and Processing
DEM data are widely used to describe surface topography.In this study, the DEM data are from the International Scientific Data Service Platform (http://www.cnic.cas.cn/)[16], and the resolution of the DEM data is 30 m × 30 m.A large amount of DEM data, covering more than 80% of China, has been collected and to reflect the comprehensive complexities of the terrain, a group of regions with different complexities, which are represented by fractal dimensions from 2.0 to 2.8, are chosen in this paper.We obtain the sampling points under different sampling modes and sampling densities, which have different spatial characteristics.The overall procedure and details are further described in Section 3.2.
Fractal dimensions can be used to describe the variations of terrain in the entire study region and to indicate the level of terrain complexity, which includes the surface-volume method, the cubic covering method, and the surface-scale method.For the DEM data, surface or volume is calculated before obtaining the fractal dimension, which causes errors [17][18][19][20].To decrease the errors, the cubic covering method is used in this paper.Some terrain data with different levels of terrain complexity D are shown in Figure 1.Considering the national surveying specifications and characteristics of DEM, each DEM block is 144 km 2 (12 km × 12 km), and six sampling densities are used in this paper (i.e., 18.1 points/km 2 , 5 points/km 2 , 4.7 points/km 2 , 3.1 points/km 2 , 2.3 points/km 2 , and 1.8 points/km 2 ).
The sampling mode refers to the specific rules and methods used in determining the locations of the sampling points.Many sampling modes are adopted in interpolation research, and they can be divided into two large categories according to the distribution of sampling points: uniform sampling and non-uniform sampling.Some scholars used less popular methods, such as profile sampling and asymptotic sampling; OÈzdamar et al. applied herringbone-shaped, regular-grid, and layered random sampling modes in studying interpolation methods [21].Demirhan et al. used four modes to sample the area under study: herringbone-shaped, regular-grid, linear, and ring sampling modes [22].The sampling modes they used can be performed by computer but cannot accurately represent terrain feature points.To solve this problem, the regular-grid sampling mode, selective sampling mode, and combined sampling mode are used in this paper.The selective sampling mode focuses on terrain feature points when sampling, whereas the combined sampling mode takes the regular-grid sampling mode and selective sampling mode into consideration at the same time.

Experiment Design
To systematically study how the spatial interpolation methods are affected by terrain complexity, sampling mode, and sampling density, DEM data with a resolution of 30 m are first divided into a series of data blocks with equal area.Next, data blocks with nine types of terrain complexity (sequentially increasing from level 2.0 to 2.8) are selected, and each block is sampled in three sampling modes and at six sampling densities for each mode.Finally, these point datasets are interpolated with different interpolation methods, and accuracy analysis is conducted.This procedure is illustrated in Figure 2.

Determination of Affecting Factors
The parameters of the interpolation methods have significant influences on the interpolation results, but there is no particular "best value" for the parameters.The parameters for IDW and RS in this paper are set using default values from ArcGIS 10.0, which are set by multiple tests.For OK, through repeated tests and validations, the spherical method is selected in this paper.

Validation
Independent verification is used for the validation of the interpolators in this paper.Each DEM is 12 km × 12 km and contains 400 × 400 sampling points.These sampling points are divided into the interpolation and validation subsets, estimating the value using interpolation subset and comparing the interpolated value at every validation point with its measured value.For the six levels of sampling density, 2704, 1225, 676, 441, 324, and 256 training points are created as interpolation data sets.The remaining, 157,269, 158,775, 159,559, 159,676, and 159,744 sampling points are used as validation datasets.

Assessment Indices
Many indices have been developed to assess the spatial interpolation accuracy.Each index alone cannot fully reflect the overall characteristics of the errors, so multiple indices are used in the analysis.In this paper, we use the three most common indices, i.e., maximum absolute errors (MAX), bias of errors (BIAS), and root mean squared of errors (RMSE), as measures of the interpolation accuracy.These three indices are determined by the following equations, where e1, e2, …, en are the errors between validation points and interpolation, abs(e1), abs(e2), …, abs(en) are the absolute values of e1, e2, …, en, and n is the number of validation points.

Results
The accuracy of the interpolation results of the three methods is analyzed with respect to four characteristics: the distribution of errors, MAX, BIAS, and RMSE.The effects of the sampling mode, sampling density, and terrain complexity on interpolation results are also comprehensively analyzed and discussed, and the results are as follows.

Regular-Grid Sampling
In regular-grid sampling, the distributions of errors of the three interpolation methods have the following characteristics (Figure 3): The major locations of the error distribution are similar, in the mountain ridges, valleys, and peaks.With an increase of sampling density, the number of plaques of error-distributed area increases abruptly, and the size of the plaques gradually decreases.For the OK method, the distribution of errors follows the directions of the mountain ridges and valleys.At higher densities of sampling points, the error-distributed areas are mostly broken, and at lower densities of sampling points, the error-distributed areas are mostly continuous, with the least number of small error plaques.The size of the error-distributed areas is much larger than that of the other two methods.

Selective Sampling
In selective sampling, the distributions of errors are different from those of regular-grid sampling, and the following characteristics are found (Figure 4): The major locations of the error distribution are similar in the mountain ridges, valleys, and peaks.With an increase of sampling density, the number of plaques of error-distributed area increases abruptly, and the size of the plaques gradually decreases.In regions with sparsely distributed sampling points, large blocks of error-distributed area appear, and for the OK method, the distribution of errors follows the direction of mountain ridges and valleys.At high densities of sampling points, many small plaques of error-distributed area are found, and at low densities of sampling points, the least number of small plaques of error-distributed area are found.The error-distributed areas are much larger than those of the other two methods.In the case of combined sampling, the following characteristics are found (Figure 5): The major locations of the error distribution are similar in the mountain ridges, valleys, and peaks.For the OK method, the distribution of errors follows the direction of the mountain ridges and valleys.At high densities of sampling points, many small plaques of error-distributed area are found, and at low densities of sampling points, the least number of small plaques of error-distributed area are found.The error-distributed area is much larger than those of the other two methods.

Comprehensive analysis
In all sampling modes, the major locations of error distribution are in the mountain ridges, valleys, and peaks.In the selective sampling mode, large blocks of error-distributed area occur when the sampling points are sparse.With the increase of sampling density, the number of plaques of error-distributed area increases abruptly, and the size of the plaques gradually decreases.
The OK method has the maximum area of error distribution, with the most broken area at high sampling densities and the most continuous area and the least number of small error plaques at low sampling densities.

Regular-Grid Sampling
The experimental results show that the MAXes of the three methods have the following patterns in regular-grid sampling (Table 1): With the increase of terrain complexity, the MAXes increase.With the increase of sampling density, MAXes decrease.With the decrease of sampling density, MAXes somewhat increase.The IDW method has the largest MAXes among the three methods, and the RS method has the smallest MAXes among the three methods.

Selective Sampling
In selective sampling, the experimental results show the following patterns (Table 2):  With the increase of terrain complexity, the MAXes increase significantly.With the increase of sampling density, the MAXes somewhat decrease.With the decrease of sampling density, the MAXes increase, and a greater increase is found at lower sampling density.The OK method has the minimum MAXes among the three methods, and for the other two methods, the RS method has smaller MAXes at higher sampling density and the IDW method has smaller MAXes at lower sampling density.

Combined Sampling
In combined sampling, the experimental results show the following patterns (Table 3): With the increase of terrain complexity, the MAXes also increase.With the increase of sampling density, the MAXes somewhat decrease.With the decrease of sampling density, a greater increase is found at lower sampling density.The OK method has the smallest MAXes among the three methods, and for the other two methods, the RS method has smaller MAXes at higher sampling density and the IDW method has smaller MAXes at lower sampling density.

Comprehensive Analysis
Statistical analysis of the MAXes for all sampling modes, nine sampling densities, and six levels of terrain complexity found the following results.
In all sampling modes, with the increase of terrain complexity, the MAXes also increase.With the increase of sampling density, the MAXes decrease relatively and vice versa.In regular-grid sampling, for all sampling densities and levels of terrain complexity, the RS method has the minimum MAXes among the three methods.In selective sampling and combined sampling, the OK method has the minimum MAXes among the three methods.

Regular-Grid Sampling
For regular-grid sampling, the BIASes of the three methods are shown in Table 4, which shows the following characteristics: With the increase of terrain complexity, the BIASes of all methods also increase.With the decrease of the sampling density, the BIASes of all methods increase, and vice versa.BIASes are not noticeably overestimated or underestimated.The RS method has the minimum BIASes among the three methods, and the IDW method has the maximum BIASes among the three methods.For selective sampling, the BIASes of the three methods are shown in Table 5, which shows the following patterns.
With the increase of terrain complexity, the BIASes of all methods also increase, but with a lower level of increase.With the decrease of the sampling density, the BIASes of all methods increase, and vice versa.Some BIASes are over-estimated.The IDW method has the maximum BIASes among the three methods, and the RS method has the minimum BIASes among the three methods.

Combined Sampling
For combined sampling, the BIASes of the three methods are shown in Table 6, which shows the following patterns.
With the increase of terrain complexity, the BIASes also increase, but the level of increase is lower than that of the other modes.With the decrease of the sampling density, BIASes slightly increase, and vice versa.Some BIASes are over-estimated.The IDW method has the maximum BIASes among the three methods, and the RS method has the minimum BIASes among the three methods.

Comprehensive Analysis
The statistical analysis of the errors for all sampling modes, nine sampling densities, and six levels of terrain complexity shows that the RS method has the minimum and the IDW method has the maximum BIASes in all cases.

Regular-grid Sampling
For regular-grid sampling, the RMSEs of the three methods are shown in Figure 6 and Table 7, which have the following patterns.
With the increase of terrain complexity, the RMSEs steeply increase.With the increase of sampling density, the RMSEs steadily decrease.The IDW method has the maximum RMSEs among the three methods, and the RS method has the minimum RMSEs among the three methods.

Selective Sampling
For selective sampling, the RMSEs of the three methods are shown in Figure 7 and Table 8, which have the following patterns: With the increase of terrain complexity, the RMSEs steeply increase.With the increase of sampling density, the RMSEs decrease, and vice versa.For lower levels of terrain complexity, the OK method has the minimum RMSEs at high sampling density, and the IDW method has the minimum RMSEs at low sampling density.For higher levels of terrain complexity, the RS method has the minimum RMSEs at high sampling density, and the IDW method has the minimum RMSEs at low sampling density.  .The RMSEs of the 3 methods in selective sampling.Sampling densities of (A) 18.1 points/km 2 ; (B) 8.5 points/km 2 ; (C) 4.7 points/km 2 ; (D) 3.1 points/km 2 ; (E) 2.3 points/km 2 ; (F) 1.8 points/km 2 .

Combined Sampling
For combined sampling, the RMSEs of the three methods shown in Figure 8 and Table 9 have the following patterns: With the increase of terrain complexity, the RMSEs increase more and more steeply.With the increase of sampling density, the RMSEs decrease, and vice versa.At lower levels of terrain complexity, the OK method has the minimum RMSEs among the three methods, and at higher levels of terrain complexity, the RS method has the minimum RMSEs among the three methods.

Comprehensive Analysis
The statistical analysis of the RMSEs for all sampling modes, nine sampling densities, and six levels of terrain complexity shows the following (Tables 7-9).
The RS method has the minimum error among the three methods in the following three cases: (i) regular-grid sampling for all sampling densities and all levels of terrain complexity, (ii) combined sampling for a high level of terrain complexity and all sampling densities, and (iii) selective sampling for a high sampling density and a high level of terrain complexity.The IDW method has the minimum error in selective sampling with low sampling density and low level of terrain complexity, and the OK method has the minimum error among the three methods in the following two cases: (i) combined sampling with all sampling densities and at a low level of terrain complexity, and (ii) selective sampling with a high sampling density and at a low level of terrain complexity.

Error Variation Trend Analysis
The analysis results of all error indices suggest that the RMSEs satisfy the least squares theory, so this error index is used to represent the accuracy of the interpolation results when analyzing the trend of error variation.show the following trends of error variation for the three methods.
(1) With the increase of terrain complexity and decrease of sampling density, the accuracy gradually decreases; (2) For regular-grid sampling, the IDW method has the lowest accuracy, and the OK method has the highest accuracy, approximating the accuracy of the RS method; (3) For selective sampling, the complicated spatial characteristics of sampling points lead to unstable sampling quality and oscillations of the accuracies of all methods; and (4) The accuracy of combined sampling is between the other two sampling modes.The IDW method has the lowest accuracy in combined sampling but is better than its accuracy in regular-grid sampling.

Discussion
To explain the reasons for the above phenomena, two small regions, A and B in Figure 12, are further examined, using a few sampling points in the regular-grid sampling mode.In combined sampling, the locations of these regular-grid sampling points are modified according to the principles and requirements of the combined sampling mode.In study region A, the sampling points in the mountain valley zone are changed; in study region B, the sampling points in the mountain ridge zone are changed.The details of the modification are shown in Figure 13.For the two sampling modes, interpolation is conducted on two different sets of sampling points, and the variation of errors at the sampling points (the absolute error before modification subtracted by the absolute error after modification) is shown in Figures 14 and 15.The variation of the errors for the three methods has the following characteristics: After modification, the errors near the original sampling points are reduced, but the errors near the modified position sampling points are increased.For the IDW method, the area with reduced errors is larger than the area with increased errors, and the reduction is greater than the increase.For the RS method, the area with reduced errors is smaller than the area with increased errors, and the reduction is smaller than the increase.The results suggest that when the uniform distribution of sampling points is changed to non-uniform, the accuracy of the IDW method is improved, but the accuracy of the RS method is decreased and the accuracy of the OK method has no significant change.
This phenomenon is analyzed for each method as follows: For IDW, after point A in Figure 13A is moved to the mountain valley and is assumed to become point B, the accuracy of interpolation is increased by 20 m.The distances between point B and other points are either decreased or increased, assuming that the set of points with increased distance to B is (1) In the three sampling modes, the RS method has good control of the MAXes, and the other two methods have poor performance; (2) The RS method has the smallest BIASes, and the IDW method has the largest BIASes for all sampling modes, sampling densities, and levels of terrain complexity.Moreover, all the BIASes are positive values, which illustrates that the three methods have the same problem of over-estimation; appropriate correction should be made for this problem in practical applications, e.g., using a negative-value-offset in the interpolation; (3) With the increase in sampling density and decrease in terrain complexity, the interpolation accuracy of three methods improves significantly; and (4) With the change of sampling mode from regular-grid sampling to selective sampling, the IDW method has a significant improvement in interpolation accuracy, but the accuracy of the RS method decreases.With the change of the sampling mode from regular-grid sampling to combined sampling, the IDW method and RS method have the same performance.With the switch of the sampling mode from selective sampling to combined sampling, both the IDW method and the RS method have significant improvements in interpolation accuracy.The interpolation accuracy of the OK method does not change significantly when the sampling mode changes.The performance of the three methods illustrates that selective sampling points have better applicability for IDW than RS.However, a combination between selective sampling and regular-gird sampling is appropriate in practice and can significantly improve the interpolation accuracy.There is no significant dependence of OK method on the sampling mode; a deep spatial analysis for the sampling points should be performed for the use of the OK method.

Figure 2 .
Figure 2. Flow diagram of the experiment.

Figure 3 .
Figure 3.The distributions of errors in regular-grid sampling.(A) Error distributions of IDW; (B) Error distributions of IDW; (C) Error distributions of IDW.

Figure 4 .
Figure 4.The distributions of errors in selective sampling.(A) Error distributions of IDW; (B) Error distributions of IDW; (C) Error distributions of IDW.

Figure 5 .
Figure 5.The distributions of errors in combined sampling.(A) Error distributions of IDW; (B) Error distributions of IDW; (C) Error distributions of IDW.

Figure 9 .Figure 10 .
Figure 9.The RMSEs of the 3 methods in regular-grid sampling.(A) The trend surface of RMSEs for IDW; (B) The trend surface of RMSEs for RS; (C) The trend surface of RMSEs for OK.

Figure 11 .
Figure 11.The RMSEs of the 3 methods in combined sampling.(A) The trend surface of RMSEs for IDW; (B) The trend surface of RMSEs for RS; (C) The trend surface of RMSEs for OK.

Figure 12 .Figure 13 .
Figure 12.The locations of the sampling points before modification.(A) Sampling points before modification of area A; (B) Sampling points before modification of area B.

Figure 14 .Figure 15 .
Figure 14.The variations of the errors at the sampling points of area A. (A) Error variations at the sampling points of IDW in area A; (B) Error variations at the sampling points of OK in area A; (C) Error variations at the sampling points of RS in area A.

Table 1 .
The MAXs under the regular-grid sampling mode (Unit: m).

Table 2 .
The MAXs under selective sampling mode (Unit: m).

Table 3 .
The MAXs under combined sampling mode (Unit: m).

Table 11 .
The variation tendency of the interpolation accuracy.TC means Terrain Complexity; SD means Sampling Density; ↗ means increase; ↘ means decrease; R->S means a switch from regular-gird sampling to selective sampling, and R->C and S->C have similar meanings.