Deciphering Soil Spatial Variability through Geostatistics and Interpolation Techniques

: Detailed knowledge of soil properties is fundamentally important for optimizing agriculture practices and management. Meanwhile, the spatial distribution of soil physicochemical properties is considered a fundamental input of any sustainable agricultural planning. In the present study, ordinary kriging, regression kriging and IDW were chosen for deciphering soil spatial variability and mapping soil properties in a reclaimed area of the Behera Governorate of Egypt where soil arose from two different types, one sandstone and the other limestone. Geostatistics were used to show the interrelationships and conditions of soil properties (available phosphorus, potassium and nitrogen, EC, pH, Sp, ESP, CEC, OC, SAR, and CaCO 3 ). The results of mapping spatial soil variability by Geostatistics could be used for precision agriculture applications. Based on the soil test results, nutrient management recommendations should be applied regarding variable rates of fertilizers. The performance of the maps was evaluated using Mean square error (MSE). Inverse distance weight (IDW) showed higher efﬁciency than Kriging as a prediction method for mapping the studied soil properties in the study area. The results of the present study suggest that the application of the selected ﬁt model worldwide in any relevant study of soil properties of different geological sources is feasible.


Introduction
Detailed knowledge of soil properties is of fundamental importance for optimizing agriculture practices and management. Meanwhile, the spatial distribution of soil physicochemical properties is considered a fundamental input of any sustainable agricultural planning [1,2]. Therefore, it could help in saving effort, time, and cost for any cultivation development process. Furthermore, collecting accurate and continuous spatial data is important for justified decision making. However, the availability of data is not only difficult but also an expensive process. Thus, geostatistics play an important role in representing soil analyses spatially and highlighting variations between different parts in a study area. Many of the previous studies preferred some geostatistics methods (Kriging) over others [1][2][3][4] and many of the corresponding studies preferred vice versa [5,6], which necessitates studying the suitability of the method used under the actual conditions of the study area.
The most regularly utilized interpolation techniques [3,4] based on geostatistical methods are kriging and co-kriging [5][6][7], inverse distance weighting (IDW), and linear regression model (LR) [8][9][10]. In addition, the attention has been recently paid to techniques combining two or more different approaches [11,12]. Kriging; regression kriging [13,14] and simple kriging [15] are the most commonly used techniques for a regression model. A previous study [13] compared two kriging methods, ordinary and regression, and stated that regression kriging was more accurate for interpolating soil properties in contrasting landscapes. According to [16,17] ordinary kriging and IDW were the geostatistical techniques most frequently used to predict soil properties. The linear mixed model formulation allows for a closer integration of soil knowledge into geostatistical prediction, and the development of models which, by allowing the use of covariance structures that are not necessarily stationary, are more pedologically plausible [7]. Hence, soil properties are complex variables with many processes, not all of which are well-understood. For this reason, it is unlikely that the covariance models used in soil geostatistics can be as tightly linked to process understanding. Therefore, the further development of a framework is needed in which soil-forming processes are linked to likely statistical distributions through appropriate mathematical operators [8,9]. Inverse Distance Weighting (IDW), Global Polynomial Interpolation (GPI), Local Polynomial Interpolation (LPI), Radial Basis Functions (RBF), Kriging, Cokriging, and regression methods have been found to produce similar results with ordinary kriging when were applied to different combination of data sets to create soil moisture maps [10].
The investigated area consisted of two types of soil, one of which descends from sandstone and the other from limestone, and these differences require various agricultural management techniques, hence the importance of choosing the right geostatistical method to display the soil properties. Accordingly, it is necessary to choose the correct method for land service operations, agricultural methods, and the appropriate irrigation method.
To avoid intensive labor consumption during soil sampling, optimizing sample density and location is crucial and therefore using suitable interpolation techniques is highly recommended [18]. In this research, ordinary kriging and IDW were chosen to be applied as they are considered as the most reliable interpolation methods [19][20][21]. Also, regression kriging has also been proven to be very effective [13,[22][23][24][25][26][27], therefore it was also chosen to be applied in this study. The assessment of each method was performed depending on goodness-of-prediction statistics (G) in terms of root-mean-square-error (RMSE) and mean-absolute-percentage-error (MAPE).

Study Area
For this study, soil samples were collected from two sites (30 • 32 10 N, 30 • 11 30 E and 30 • 31 50 N, 30 • 12 10 E) in the newly reclaimed area of south part of Behera Governorate as shown in Figure 1. The profiles were dug before the establishment of any crop in the reclaimed area. The area topography is semi flat with total area of 1 km 2 .  Table 1 indicates that the soil temperature regime is thermic and soil moisture regime is torric. The soil texture is mostly homogeneous consisting of sandy loam to loamy sand and in some subsoils contains a chill layer. The deep subsurface layers of most soil profiles contain gypsum crystals and a few lime deposits. Most of the layers are made up of sand-based material with loose construction. Bulk density ranges from 1.1 to 1.2. The sub surface layer of 60-120 cm of soil material is very cohesive shale with a high percentage of salts and gypsum. 20.4 °C and average annual rainfall 102 mm of the last 30 years. The data from Table 1 indicates that the soil temperature regime is thermic and soil moisture regime is torric. The soil texture is mostly homogeneous consisting of sandy loam to loamy sand and in some subsoils contains a chill layer. The deep subsurface layers of most soil profiles contain gypsum crystals and a few lime deposits. Most of the layers are made up of sandbased material with loose construction. Bulk density ranges from 1.1 to 1.2. The sub surface layer of 60-120 cm of soil material is very cohesive shale with a high percentage of salts and gypsum.

Experiment Design
Thirty-six soil profiles and 72 surface soil samples 0-30 cm in size were used in this study. The distances among neighboring points were designed in this experiment ranging from 50 m to 150 m. The design of slope distances was distributed uniformly and met the Geostatistical interpolation methods requirements. The soil samples depth was based on the difference of layers along with the profile depth. The interpolations were performed for the surface layers of all profiles, while the interpretation of soils was described based on the weighted average of all the soil layers.

Experiment Design
Thirty-six soil profiles and 72 surface soil samples 0-30 cm in size were used in this study. The distances among neighboring points were designed in this experiment ranging from 50 m to 150 m. The design of slope distances was distributed uniformly and met the Geostatistical interpolation methods requirements. The soil samples depth was based on the difference of layers along with the profile depth. The interpolations were performed for the surface layers of all profiles, while the interpretation of soils was described based on the weighted average of all the soil layers.

Interpolation Methods
The following equation was used for IDW and kriging interpolation methods as detailed in [28]: where the Z (xi) data value of locations which were used to generate the variable Z value at x 0 the unsampled location, Z (xi) value is assigned by the weight wi, and n is the number of the used closest neighboring data points for estimation.
di is the distance between the estimated point and the observed point.
xi and xi + h are sampling locations separated by a distance h, and Z(xi) and Z(xi + h) are the observed values of variable Z at the corresponding locations.
The least squares method which was used to estimate the linear regression is the following equation s detailed in [29][30][31][32][33][34]: The disturbed and undisturbed soil samples were collected to determine the different soil properties. The undisturbed soil samples were collected by metal soil cores of 2.5 and 5.0 cm thickness to determine soil available moisture range and bulk density, respectively, as well as using soil tubes to measure hydraulic conductivity. The disturbed soil samples were air-dried, ground gently, and sieved through 2 mm sieve to obtain the fine earth. Then, the physical and chemical properties were determined as follows.

Physical Soil Properties
The particle size distribution was determined using the pipette method, using sodium hexametaphosphate as a dispersing agent [35]. Soil bulk density was determined using metal cores for the undisturbed soil samples. Meanwhile, for the shale platy samples, the paraffin wax method was applied [36]. The measuring cylinder was used for the single grained sandy soils [37]. The hydraulic conductivity coefficient was determined using undisturbed soil cores, using Darcy's law [35]. Field capacity was measured as soil moisture content % at either 0.10 atm for the relatively coarse texture soils or 0.33 for the fine texture ones as well as welting point at 15 atm, using the Pressure Membrane Apparatus Method [37]. The available water range was calculated as the difference in moisture content % between field capacity and wilting [37].

Chemical Soil Properties
The determination of chemical soil properties was carried out using the techniques described by [38] as follows. The pH was measured in the saturated soil-water paste using Beckman pH meter. Calcium plus Magnesium were determined by the Versenate method using ammonium purpurate as an indicator, and magnesium was then calculated by the difference between Calcium plus Magnesium and Calcium. Sodium and potassium were photometrically determined using Flame-photometer. Carbonates and bicarbonates were determined by titration with 0.01 N sulfuric acid using phenolphthalein as an indicator for carbonates and methyl orange for bicarbonates. Chloride was determined using Mohr's method. Sulphate was calculated by subtracting the total soluble anions from the total soluble cations [36,39].

Results and Discussion
In the selected research area, soil properties were fairly homogeneous. Tables 2 and 3 and Figure 3 summarize the descriptive statistics of soil properties. The predicted continuous spatial surface of selected properties was performed using the five powers of IDW and Ordinary Kriging and Regression Kriging methods as shown in Figures 4-7.  Different geostatistical methods of co-kriging and the Inverse Distance Method (IDW) (powers are 1, 2, 3, 4 and 5) were used to estimate different soil parameters. The used interpolation methods were assessed and compared based on the calculations of coefficient of determination, root mean squared error, and mean absolute error. The best performance among the studied interpolation techniques was achieved by IDW power 2. These results recommended that the use of this method is efficient in the specific study  The spatial predictions of CaCO3 content, SP, pH, and EC in the study area are shown in Figures 3-7. For all soil variables, the maximum standard errors of the predictions were less than 7% of the maximum data values in logarithmic scale, indicating that the resulting spatial predictions obtained by inverse distance weight overcome co-kriging techniques in this case and both can be considered reliable.  The surface elevation in the area varied from 5 to 72 masl (Figure 1). Topography in the area was undulating with slope gradient range from 0 to 1% and 1 to 3%. The soil slope length was less than 150m with good drainage conditions. Generally, the surface soil structure was sandy with a loose structure and a small percentage of fine lime with a very low ratio of fine roots and no stones in the surface. Meanwhile, the subsurface layers of the soil profile consisted of sandy loam soil, with a disintegrated structure of brown iron oxide, a low concentration of lime, and few patches of limonite. Soil pH ranged from 7.1 to 8.9 in the surface layer while it ranged from 7.7 to 8.8 in the subsurface layers, which revealed the neutrality conditions of the soil. The electrical conductivity of surface soils ranged from 0.23 to 32.7 ds m −1 , while in the subsurface layers it fluctuated from 0.41 to 6 ds m −1 . Lime was slightly occurring in the soils where it ranged from 2 to 11.4 in the surface layer, and from 0.50 to 8.5 in the subsurface layers. CEC was mostly low ranging from 18 to 22 except for the profiles that were rich in clay where it reached 54. SAR ranged between 3.4 and 5.2% while ESP ranged between 2.7 and 4.9. The study area had low OC content, in most cases less than 0.5. significant correlations between elevation-based variables and soil physical properties were previously reported [40].     The semivariograms were calculated to identify the spatial structure of soil properties and the best among the fit models was identified for the description of these spatial structures. For the surface soil parameters, the best one was the Gaussian model while the exponential model was for SAR. For the subsurface parameters, Gaussian, exponential, and spherical models were fitted to all. Based on the range of influence, the sill (Co + C), and the nugget effect (Co) for each of the parameters, the degree of autocorrelation was related to the spatial dependencies (Nugget/Sill ratio) between the sampling points. The semivariogram of surface soil and subsurface soil presented in Figure 8.
For the surface and subsurface depths, a strong spatial dependence was found for the nitrogen and OC while the rest soil parameters were found to be moderately spatially dependent. A significant variation in the SOC contents between different types of karst landforms, especially in the topsoil layers was also reported in [48]. Soil pH and clay were found to be essential factors influencing the SOC spatial distributions in the two soil profiles [49].
The lowest values of nugget effect were estimated for pH and K, indicating the low random variance of variables in the study area. These results are in conjunction with the findings of [42,50,51]. The descriptive statistics of soil properties is summarized in Tables 2 and 3 and Figure 3. The results suggest that all variables were normally distributed. For all the observed variables, the coefficient of variation was very different in both depths. It was observed that pH had the lowest coefficient of variation. This result could be attributed to the uniform soil conditions in the area. This resulted from the subtle changes in slope and its direction. This could be the main factor affecting pH. Slope positions showed significant effects on most of the soil physico-chemical properties in [41].
Different geostatistical methods of co-kriging and the Inverse Distance Method (IDW) (powers are 1, 2, 3, 4 and 5) were used to estimate different soil parameters. The used interpolation methods were assessed and compared based on the calculations of coefficient of determination, root mean squared error, and mean absolute error. The best performance among the studied interpolation techniques was achieved by IDW power 2. These results recommended that the use of this method is efficient in the specific study area, however the best estimation was not always achieved through this method. Consequently, it is recommended to take into account all the parameters involved, such as stratification, topography, and climate, for the accurate prediction of soil properties in terrestrial ecosystems.
The spatial predictions of CaCO 3 content, SP, pH, and EC in the study area are shown in Figures 3-7. For all soil variables, the maximum standard errors of the predictions were less than 7% of the maximum data values in logarithmic scale, indicating that the resulting spatial predictions obtained by inverse distance weight overcome co-kriging techniques in this case and both can be considered reliable.
The semivariograms were calculated to identify the spatial structure of soil properties and the best among the fit models was identified for the description of these spatial structures. For the surface soil parameters, the best one was the Gaussian model while the exponential model was for SAR. For the subsurface parameters, Gaussian, exponential, and spherical models were fitted to all. Based on the range of influence, the sill (Co + C), and the nugget effect (Co) for each of the parameters, the degree of autocorrelation was related to the spatial dependencies (Nugget/Sill ratio) between the sampling points. The semivariogram of surface soil and subsurface soil presented in Figure 8.  Figure 8 shows that some variables showed no spatial dependence structure, characterized by the pure nugget effect model (PNE). When the variable under study is spatially independent, the nugget effect is equal to the sill. In a similar study in vineyards, the experimental semivariograms were best fitted to theoretical models without nugget effect only on 2 occasions (Mg and Na) and by a nugget effect plus a structure in the remaining datasets [52]. The PNE could occur due to measurement and sampling errors or undetected micro-variations when taking into account sample spacing larger than that necessary to detect spatial dependence. Variables with moderate spatial dependence may be due to soil homogeneity. In this sense, range values are important measures for planning and experimental evaluations, as range can assist in sampling procedure definition. In this respect, the range of the scaled semivariogram was used to estimate the minimum number of samples to characterize soil spatial variability of soil chemical properties. The low land environments are more heterogeneous in chemical properties, so they require a higher number of samples to analyze variability with a sampling density of eight points ha-1 and spacing of 50m. This aspect may be connected to intensive use of the soil surface layers, which are more heterogeneous. In contrast, the upper land area had lower variability for soil chemical properties. Consequently, the sampling density required is lower than in the other environments with 150 m. Referring to the previous studies [42][43][44][45][46][47] the classification of spatial dependent variables used, was referred as strongly spatially dependent if the ratio was <25, moderately if between 25 and 75%, and weak if it was >75%.
For the surface and subsurface depths, a strong spatial dependence was found for the nitrogen and OC while the rest soil parameters were found to be moderately spatially dependent. A significant variation in the SOC contents between different types of karst landforms, especially in the topsoil layers was also reported in [48]. Soil pH and clay were found to be essential factors influencing the SOC spatial distributions in the two soil profiles [49].
The lowest values of nugget effect were estimated for pH and K, indicating the low random variance of variables in the study area. These results are in conjunction with the findings of [42,50,51]. Figure 8 shows that some variables showed no spatial dependence structure, characterized by the pure nugget effect model (PNE). When the variable under study is spatially independent, the nugget effect is equal to the sill. In a similar study in vineyards, the experimental semivariograms were best fitted to theoretical models without nugget effect only on 2 occasions (Mg and Na) and by a nugget effect plus a structure in the remaining datasets [52]. The PNE could occur due to measurement and sampling errors or undetected micro-variations when taking into account sample spacing larger than that necessary to detect spatial dependence. Variables with moderate spatial dependence may be due to soil homogeneity. In this sense, range values are important measures for planning and experimental evaluations, as range can assist in sampling procedure definition. In this respect, the range of the scaled semivariogram was used to estimate the minimum number of samples to characterize soil spatial variability of soil chemical properties. The low land environments are more heterogeneous in chemical properties, so they require a higher number of samples to analyze variability with a sampling density of eight points ha-1 and spacing of 50m. This aspect may be connected to intensive use of the soil surface layers, which are more heterogeneous. In contrast, the upper land area had lower variability for soil chemical properties. Consequently, the sampling density required is lower than in the other environments with 150 m.

Conclusions
One of the first important aspects for ensuring soil sustainably in agriculture is improving the knowledge about soil properties by selecting the fit production operation of soil mapping. The soil properties were described using classical statistics after the normalization of data, while their spatial variability was illustrated by using interpolation techniques in a GIS environment. The results showed that the soil properties with high variability in terms of coefficient of variation were available phosphorus and potassium (C.V ≥ 35%); nitrogen, OC, CEC, ESP, and SAR were all moderately variable (C.V = 34-15%), while pH, Sp, nitrogen, CEC, OC, had low variability (C.V ≤ 15%). These variations in the soil chemical properties are mostly related to the parent material on which the soil is formed. Soil properties (CaCO 3 content, SP, pH, and EC), especially soil salinity can change abruptly due to local management. In the present case study, the second order IDW was shown to be adequate in predicting the spatial distribution of the selected soil characteristics. The low land environments are more heterogeneous in chemical properties and so they require a higher number of samples to analyze variability with a sampling density of eight points ha −1 and spacing of 50m. This aspect may be connected to the intensive use of the soil surface layers, which are more heterogeneous. In contrast, the upper land area had lower variability for soil chemical properties; Consequently, the sampling density required is lower than in the other environments with 150 m. A prediction accuracy of 94%, 89%, and 96% for pH, SOM and clay content, respectively, through the deployment of remote sensing data was reported [53], in conjunction with the present study, imply that efficient methods are available for precise soil mapping in the service of farmers, stakeholders, and scientists.