This article is an openaccess article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
Production of high quality interpolation maps of heavy metals is important for risk assessment of environmental pollution. In this paper, the spatial correlation characteristics information obtained from Moran’s I analysis was used to supplement the traditional geostatistics. According to Moran’s I analysis, four characteristics distances were obtained and used as the active lag distance to calculate the semivariance. Validation of the optimality of semivariance demonstrated that using the two distances where the Moran’s I and the standardized Moran’s I,
Heavy metal pollution in agricultural soils is becoming an urgent problem worldwide, due to increasing intensive anthropogenic activities, such as the discharge of wastes from metal processing plants, burning of fossil fuels and pesticide use. Excessive accumulation of heavy metals in agricultural soils can also be a source of pollution of surface and ground waters, living organisms, sediments, and oceans. Thus, mapping the spatial distribution of heavy metals in soils is critical for risk assessment of potential environmental pollution and for establishing protocols for pollution remediation, in particular, for China, with the recent three decades of intense economic development, the soil heavy metals’ pollution is now in a high risk state.
Geostatistics, providing a technique of semivariance to quantify the spatial patterns of soil parameters, is being increasingly adopted for spatial pattern analysis of heavy metals [
Spatial autocorrelation analysis is another alternative method that has been widely used to explore the spatial pattern of variables in many fields [
Although spatial autocorrelation analysis cannot be used for estimation of unsampled areas, we recognized it could provide useful information for spatial variable mapping, if combined with a kriging method, for the production of high quality distribution maps. Therefore, taking heavy metals in Beijing agricultural soils as a case study, the primary objectives of this research were: (1) to compare spatial autocorrelation analysis and geostatistics for identifying the spatial pattern of heavy metals; and (2) to use the Moran’s I analysis results as the
To investigate the pollution status of heavy metals in Beijing agricultural areas, a largescale soil sampling project was conducted after the crop harvest in the autumn of 2006. According to the agricultural land distribution and land use type maps of Beijing, a nonuniform distribution of the stratified sampling technique was adopted to collect samples and ensure the representativeness of samples. The sampling strategy was divided into three steps to collect a total of 1,018 samples. First, 231 soil samples were collected from the entire study area, with uniform sampling being the low sampling density (C). Secondly, another 360 soil samples were added from areas with more agricultural soils to create the medium sampling density (M). Third, 427 soil samples were further collected on the basis of the two previous samplings and the agricultural soils to make a high sampling density (F).
The distribution of soil samples at three levels.
For each sample, five surface soil (0~20 cm) sites were sampled within 10 × 10 m square areas and then mixed. A Global Positioning System was used to precisely locate each sampling position (latitude and longitude); and a total of 1 kg of mixed soil per sample was collected. All soil samples were collected using a stainless steel spade and a scoop made from bamboo and then stored in polyethylene bags. The soil samples were airdried, crushed in an agate mortar, and then passed through a 100mesh nylon sieve. The concentrations of eight heavy metals, including Cr, Ni, Cu, Zn, As, Cd, Pb, and Hg, were analyzed in the soil samples following the Chinese Environmental Quality Standard for Soils (GB156181995). After digesting the samples with a mixture of HCl, HNO_{3} and HClO_{4}, the Cr, Ni, Cu, and Zn concentrations were analyzed by flame atomic absorption spectrophotometry, Pb and Cd were analyzed by graphite furnace atomic absorption spectrophotometry, and the As concentration was determined by potassium borohydridesilver nitrate spectrophotometry. In addition, the Hg concentration was analyzed by cold atomic absorption spectrophotometry after the samples were digested with a mixture of H_{2}SO_{4}, HNO_{3} and KMnO_{4}. During processing, all samples were handled carefully to avoid input or loss of trace elements during preparation and analysis.
Spatial autocorrelation is an assessment of the correlation of a variable in reference to spatial location of the variable, which is a match between location similarity and attribute similarity [
where
The selection of neighbors is formally specified in the weights matrix
where
The spatial correlogram is a graph where the Moran’s I is plotted in ordinate, against distances among localities (in abscissa). According to Legendre and Fortin, the spatial correlogram can be standardized into a standardized correlogram, in which the ordinate is the standardized Moran’s I,
Local Moran’s I is a local test statistic for spatial autocorrelation, which is used to identify the locations of spatial clusters and spatial outliers. It is computed as follows:
The notations in Equation (3) are as described for Equation (1), but the corresponding values are from the local neighboring region. For more details of Moran’s I principles and methods, see the references [
Geostatistics uses the technique of semivariance to measure the spatial variability of a regionalized variable, and provides input parameters for the spatial interpolation of kriging [
where
A variogram plot can be acquired by calculating variogram at different lags. Data pairs were grouped into lag “bins” and Equation (4) was used to calculate the variogram for that bin. The mean lag of all the pairs in a particular bin was used as the representative lag for that bin.
The variogram plot is fitted with a theoretical model, such as spherical, exponential, Gaussian, linear and power models. In this study, the exponential model was selected.
The exponential function is:
where
Due to the complexity of spatial data, its spatial variability usually needs be described using two or more theoretical semivariances. This is the socalled nested model, which is described by the following equation [
where
The fitted model provides information about the spatial structure as well as input parameters for kriging interpolation. Kriging is a linear interpolation technique that provides a linear unbiased estimate for spatial variables, which can be depicted as follows:
where
and the estimation errors (or kriging variances) need to be minimized.
With wide and increasing applications of spatial interpolation methods, there is a growing concern about their accuracy and precision. Accuracy of spatial interpolation was evaluated through crossvalidation approach. Commonly used error measures include: mean error (ME), mean absolute error (MAE), mean squared error (MSE) and root mean squared error (RMSE). Willmott suggests that MAE and RMSE are among the “best” overall measures of model performance [
MAE is calculated as:
RMSE can be calculated as:
Because MAE does not reveal the magnitude of error that might occur at any point, MSE will be calculated [
where
Soil samples were stored using the ArcView 3.2 software to create a spatial database. The spatial autocorrelation analysis was conducted using Geoda095i software. The experimental semivariance models were constructed using GS+5.3, while kriging was performed using the geostatistical analyst extension of ArcGIS 8.3.
As in conventional statistics, a normal distribution for the variable under study is desirable in linear geostatistics. Even though normality may not be strictly required, serious violation of normality, such as too high skewness and outliers, can impair the variorum structure and the kriging results. It is often observed that environmental variables are lognormal [
The normality tests of the eight heavy metals for the 1,018 samples were performed as described by Huo
In general, the higher the absolute value of Moran’s I is, the stronger a spatial autocorrelation exists, and the larger the absolute value of standardized Moran’s I is, the more significant a spatial structure exists.
Raw spatial correlograms of heavy metals (
Standardized spatial correlograms of heavy metals (
The advantage of the standardized Moran’s I is that it can compare the significant spatial patterns of different variables or of the same variable with different calculating parameters. At the global level,
Compared with the local Moran’s I statistical analyses in
Spatial autocorrelation characteristics of the four heavy metals at global and local levels based on the distance where the Moran’s I reached maximum.
Heavy metals  Local spatial correlation type  Global  

No significance  Highhigh  Lowlow  Lowhigh  Highlow  
Cr  Moran’s I  0.0872  0.8498  0.7035  −0.1428  −0.2755  0.4801 
Standardized Moran’s I  0.7040  6.7898  5.6228  −1.1319  −2.1906  3.8396  
Percent of spatial types  56.09  14.34  22.2  3.05  4.32    
Ni  Moran’s I  0.1661  0.9994  0.7527  −0.1724  −0.4386  0.3173 
Standardized Moran’s I  1.1318  6.7756  5.1044  −1.1608  −2.9635  2.1558  
Percent of spatial types  69.94  7.07  12.48  7.96  2.55    
Zn  Moran’s I  0.1165  0.7315  0.8123  −0.1152  −0.782  0.2924 
Standardized Moran’s I  0.7871  4.9050  5.4464  −0.7650  −5.2300  1.9648  
Percent of spatial types  66.7  8.74  13.46  7.56  3.54    
Hg  Moran’s I  0.0742  0.9958  0.7823  −0.228  −0.3971  0.2725 
Standardized Moran’s I  0.5223  6.9279  5.4441  −1.5775  −2.7529  1.9009  
Percent of spatial types  67.78  9.63  11.3  8.35  2.95   
Spatial autocorrelation characteristics of the four heavy metals at global and local levels based on the distance where the standardized Moran’s I reached maximum.
Heavy metals  Local spatial correlation type  Global  

No significance  Highhigh  Lowlow  Lowhigh  Highlow  
Cr  Moran’s I  0.0296  0.3906  0.3731  −0.2065  −0.1656  0.3333 
Standardized Moran’s I  0.3951  5.0561  4.8293  −2.6525  −2.1245  4.3158  
Percent of spatial types  28.09  19.45  32.81  6.19  13.46    
Ni  Moran’s I  0.0351  0.4643  0.4287  −0.1674  −0.2148  0.2444 
Standardized Moran’s I  0.3685  4.7486  4.3857  −1.6979  −2.1822  2.5046  
Percent of spatial types  55.30  14.83  20.73  5.01  4.13    
Zn  Moran’s I  0.0859  0.4629  0.5584  −0.2667  −0.4308  0.2367 
Standardized Moran’s I  0.7422  3.9634  4.7794  −2.2698  −3.6722  2.0308  
Percent of spatial types  56.58  13.16  18.27  5.80  6.19    
Hg  Moran’s I  0.0064  0.5812  0.3923  −0.2215  −0.2433  0.2054 
Standardized Moran’s I  0.0780  6.1029  4.1226  −2.3114  −2.5401  2.1637  
Percent of spatial types  49.90  16.01  20.53  5.30  8.25   
The disagreements in the spatial autocorrelation characteristics in
For the four heavy metals, their standardized spatial correlograms had more than one distinct waveform (
For the four metals, the amplitudes of spatial clusters were larger than for spatial outliers, indicating that positive spatial autocorrelation dominated at the global level. The same characteristics of the raw and standardized spatial correlograms were the distances where the 0 value first appeared, which were 57 km, 75 km, 57 km, and 55 km for Cr, Ni, Zn, and Hg, respectively (
The ranges (spatial correlation distances) of Cr, Zn, and Hg are around 60 km, which indicated the spatial distribution of these three heavy metals may be similar, and the source may be the same. The spatial variability of Hg was significant, which indicated that the concentrations of Hg in soils were mainly affected by random factors (human activities).
The spatial correlation distances of Cr, Ni, Zn, and Hg from geostatistics were 59.55 km, 94.50 km, 65.79 km and 65.10 km, respectively (
Semivariogram models for Cr, Ni, Zn, and Hg and their parameters (range, km).
Heavy metals  Model  Nugget (C_{0})  Sill (C_{0} + C)  Range (A_{0})  Nugget/sill (C_{0}/(C_{0} + C))/%  R^{2}  RSS 

Cr  Exponential  0.0251  0.0733  59.55  34.2  0.980  1.21 × 10^{−5} 
Ni  Exponential  0.0596  0.1423  94.50  41.9  0.972  3.52 × 10^{−5} 
Zn  Exponential  0.0377  0.0801  65.79  47.1  0.930  3.80 × 10^{−5} 
Hg  Exponential  0.5010  1.0250  65.10  48.9  0.969  5.20 × 10^{−3} 
According to the spatial correlograms, four representative distances were selected:
The four characteristic distances of Cr, Ni, Zn, and Hg (km).
Heavy metals  Distance

Distance

Distance

Distance


Cr  6  16  57  76 
Ni  4  10  75   
Zn  4  7  57  78 
Hg  4  11  55  91 
A variogram plot can be acquired by calculating variograms at different lags. For irregular sampling, the active lag distance is often represented by a distance band. Generally, the distance band was adjusted repeatedly for higher match between the theoretical model and the experimental semivariance. In this study, in order to effectively and quickly find the suitable active lag distance, we tried to use the distances parameters extracted from the Moran’s I analysis as an auxiliary tool. Therefore, the four characteristic distances were tested as the active lag distance to fit the semivariance and produce spatial interpolation, and these were labeled by model
Take Cr as a case. Semivariogram
The scatter plots and fitted model based on the traditional geoststistics model (
The spatial interpolation maps of the four heavy metals were conducted using the ordinary kriging method based on model
Evaluation indices of the interpolation maps of heavy metals.
Evaluation indices  Cr  Ni  Zn  Hg  

Model

MAE  6.37  1.45  8.66  0.0935 
RMSE  10.56  3.20  12.56  0.2170  
MSE  111.57  10.27  157.66  0.0471  
Model

MAE  6.47  3.55  9.92  0.1195 
RMSE  10.81  7.29  14.34  0.2683  
MSE  116.76  53.19  205.53  0.0720  
Model

MAE  7.75  5.44  12.16  0.1147 
RMSE  12.73  9.91  17.42  0.2540  
MSE  161.97  98.20  303.61  0.0645  
Model

MAE  7.66    12.53  0.1351 
RMSE  12.59    17.88  0.2817  
MSE  158.51    319.55  0.0794  
Model

MAE  7.06  4.83  10.86  0.1160 
RMSE  11.69  9.16  15.66  0.2557  
MSE  136.73  83.86  245.16  0.0654  
Model

MAE  5.31  1.17  7.67  0.0918 
RMSE  8.90  2.59  11.09  0.2291  
MSE  79.19  6.73  123.03  0.0525 
Statistics results of measured and predicted heavy metals concentrations (mg·kg^{−1}).
Heavy metals  Mean  Minimum  Maximum  Range  Standard Ddeviation  CV (%)  

Cr  Measured value  60.75  31.60  300.00  268.40  20.49  33.73 
Model

60.57  39.45  156.59  117.14  14.28  23.57  
Model

60.56  39.93  142.62  102.69  13.90  22.95  
Model

60.50  40.69  136.41  95.72  13.50  22.32  
Model

60.82  38.19  162.12  123.94  15.18  24.95  
Ni  Measured value  28.49  8.87  203.38  194.51  11.25  39.49 
Model

28.42  10.16  139.08  128.91  8.64  30.42  
Model

28.39  13.45  59.34  45.89  5.88  20.72  
Model

28.44  15.04  44.72  29.68  4.86  17.10  
Model

28.55  10.10  147.14  137.04  9.15  32.07  
Zn  Measured value  76.27  28.50  221.62  193.12  21.03  27.57 
Model

76.26  45.44  144.82  99.38  12.31  16.14  
Model

76.20  47.76  128.93  81.17  11.18  14.67  
Model

76.14  49.99  117.25  67.27  10.37  13.62  
Model

76.55  43.75  159.64  115.88  13.43  17.55  
Hg  Measured value  0.2175  0.0005  4.2900  4.2895  0.3210  147.59 
Model

0.2113  0.0219  1.6256  1.6037  0.1602  75.80  
Model

0.2035  0.0509  0.8837  0.8328  0.1177  57.87  
Model

0.2072  0.0485  1.1382  1.0897  0.1327  64.04  
Model

0.2129  0.0314  1.1547  1.1233  0.1510  70.90 
The above results shown that the characteristic distances provided by Moran’s I analysis are feasible for improving the spatial estimation accuracy, but the mathematical proof of the methodology was not explored here. In addition, ordinary kriging, the most basic and commom spatial interpolation method, was used to, other kriging model such as indicator kriging (which makes no assumption of normality) with Moran’s I analysis can be further examined for the possible.
In order to understand the impact of the spatial interpolation on the pollution status of heavy metals, a singlefactor method was used to assess the pollution status, based on the critical value of “Chinese Environmental Quality Standard for Soils” (GB 156181995). The pollution status was classified, according to a singlefactor pollution index, as unpolluted or polluted [
The sample agreements in pollution status between ground measure and interpolation (%).
Cr  Ni  Zn  Hg  

Polluted  Unpolluted  Polluted  Unpolluted  Polluted  Unpolluted  Polluted  Unpolluted  
Model

Polluted  0.10  2.36  3.05  0.49  
Unpolluted  0.59  99.31  1.57  96.07  0.10  99.90  3.24  93.22  
Model

Polluted  1.28  0.29  1.87  0.39  
Unpolluted  0.69  99.31  2.65  95.78  0.10  99.90  4.42  93.32  
Model

Polluted  0.49  0.29  1.96  0.39  
Unpolluted  0.69  99.31  3.44  95.78  0.10  99.90  4.32  93.32  
Model

Polluted  0.10  3.05  3.24  0.59  
Unpolluted  0.59  99.31  0.88  96.07  0.10  99.90  3.05  93.12 
For Zn, 0.1% of measured samples were in pollution status, and these became unpolluted after spatial interpolation using all models, although the nested model of
For Hg, 6.29% of measured samples were in pollution status. Compared with the model
The results of pollution status assessment of samples showed that, compared with the traditional geostatistics model
In general, heavy metal polluted soil samples, as a small probability event, would be underestimated by interpolation, which is exemplified by Zn and Cr in this study. If the distance where the Moran’s I reached maximum (
As mentioned previously, spatial autocorrelation analysis was adopted for the optimality of semivariance by simply deleting the spatial outliers. However, in pollution studies, this may cause severe hypercorrectness when the spatial outliers are in fact reasonable. If global dominating spatial patterns are the focus, then standardized spatial correlogram can provide the spatial correlation distance for the optimality of semivariance. In contrast, if abnormal situations or details are the key, such as evaluation of the soil heavy metal pollution, then raw spatial correlogram can provide the information to help the optimality of semivariance. Moreover, a nested model that fuses both the details and the dominating spatial patterns can provide an even better prediction.
In the current study, for Hg metal element, there still was a large gap of assessment accuracy between the nested model interpolation and the measured values. A greater improvement in assessment accuracy may occur if zonal geostatistics are interpolated according the spatial distribution of the local Moran’s I spatial pattern types. In addition, as this study primarily focused on the spatial autocorrelation analysis for the optimality of semivariance, the ordinary kriging type may not have been the optimal for the estimation accuracy.
Distribution maps of heavy metals based on the nested model of
Cr concentrations in the study areas were greater than the background value (29.8 mg·kg^{−1}), and areas where Cr concentrations were three times the background value were observed in northeast Beijing (
Both geostatistics and spatial autocorrelation analysis can evaluate the spatial patterns of heavy metals. However, the two methods have their advantages and disadvantages. Geostatistics can provide a technique of semivariance to quantify the spatial patterns of soil parameters, but the fitting of variogram is influenced by subjective factors, and it will affect the kriging estimation. On the other hand, the Moran’s I analysis just can provide some spatial autocorrelation distances of variable, which have the same meaning as the range calculation from the variogram, so in this paper we tried to use this information to help calculate the semivariance in geostatistics and produce spatial interpolation to improve the accuracy of traditional geostatistics. This is the method combining geostatistics with Moran’s I analysis.
According to spatial correlogram of Moran’s I analysis, four characteristics distances were obtained and used as the active lag distance to calculate the semivariance. The resulted showed that the fitting accuracy of semivariance based on the distances where the Moran’s I and the standardized Moran’s I,
The research was supported by the National Natural Science Foundation (41130526) and Beijing Municipal Bureau of Finance programs support. The authors are grateful to the Agricultural Environmental Monitoring Station of Beijing for their soil sampling and analysis, as well as the editor and three anonymous referees for their comments on earlier versions of the manuscript.