1. Introduction
The accumulation of soil heavy metals has a significant impact on soil environmental health [
1]. Their characteristics, such as being difficult to biodegrade and prone to bioaccumulation in living organisms, pose a long-term threat to the ecological environment and human health, especially Cd, Hg, As, Pb, and Cr [
2]. The heterogeneity and complexity of soils result in significant uncertainty in the process of spatial extrapolation of heavy metal content from point to area [
3,
4]. The formation and evolution of this process are extremely intricate, with spatial characteristics being not easily discernible [
5]. However, soil sampling results can only represent the soil environment of the sampling point itself, with many significant differences from the overall soil condition in the region [
6]. Similar to selecting some samples to reflect regional characteristics from a larger population, it leads to the survivorship bias, causing difficulties in determination [
7]. This can lead to significant errors in accurately and objectively reflecting the soil environmental conditions.
To resolve this issue, the common approach is spatial valuation analysis based on geostatistics [
8]. Studying the spatial distribution characteristics and variation patterns of heavy metals in soil provides a more scientifically reasonable reference basis for the spatial distribution and risk assessment of soil pollution content [
9,
10]. Lin et al. employed a variety of heavy metal content maps to construct spatial structural models, establishing the computational distances for two spherical models and employing factor analysis based on variable-scale coefficient matrices to discern the multi-variable spatial variations of soil heavy metals [
11]. Liao compared the accuracy of six interpolation models under different conditions and analyzed the applicable conditions of different methods [
12]. There are many scholars who conduct comparative studies or joint analyses of these methods to improve the accuracy of the approaches. Hou [
13] summarized the soil sampling assessment methods under different depths and densities, analyzing the influencing factors and contribution factors in the application processes of the various methods. These methods effectively augment the precision of soil heavy metal spatial distribution research. However, in certain unique regions, the content of soil metals defies common principles and is influenced by particular factors.
Muhammad pointed out that the mining and metallurgical industry can have a decisive impact on heavy metal pollution in the soil of certain regions [
14]. Zhen’s findings underscore the geological context and weathering processes as the principal sources for certain locales’ soil heavy metal compositions [
15]. Hu conducted an in-depth analysis of regional mining areas affected by soil heavy metal accumulation and found that the sources of metals are influenced by a combination of geological activities, altitude, vegetation, and soil pH [
16]. Overall, external disturbances are the source of soil metal variations. Human activities mainly include mining [
2], industrial production [
17], and agricultural non-point source pollution [
18], while natural factors mainly include geological types, climatic factors influencing soil formation [
19], and surface material migration caused by disasters and extreme weather due to natural factors [
20]. The above factors do not occur in isolation, and over an extended temporal period, it is difficult to collect all regional data comprehensively.
These shall introduce errors in the estimation of mercury content through multivariate statistical analysis, and it is not possible to determine the specific sources and patterns of change through short-term measurements. To address this issue, Yang integrated multiple temporal soil sampling points, increased the number of soil sampling points in the region, and improved the accuracy of the simulation [
17]. Yong collects soil multi-layer profile points and uses 3D spatial interpolation method to analyze the three-dimensional spatial distribution of soil heavy metals, which allows for a more accurate simulation of the distribution location of soil heavy metals [
18]. Rahmati conducted a comparative analysis using six machine learning methods on the soil data set from the same region to ensure the impact of soil moisture content on agricultural soil environments [
21]. These methods have increased the accuracy of spatial simulation of soil heavy metals in special areas, but there is a lack of a general and universal method to increase the accuracy of soil heavy metals in more special areas. Combining existing research, how to increase the accuracy of spatial simulation of soil heavy metals in special areas through more sampling points and composite analysis methods has become a gap in high-precision spatial simulation of soil heavy metals.
The core of this paper is to explore the method for a more precise analysis of regional soil heavy metal content through the composite use of various sampling locations. This serves as an excellent supplement to the study of the spatial distribution patterns of soil heavy metals based on geospatial statistics, providing a central axis for the predictive modeling of these metals within particular regions. This article delves into the case of a karaite-style city situated in Southern China’s mountainous realm. By collating data from soil samples collected at various times and depths within the city, it examines the temporal and spatial fluctuations in five metals. Taking into account the unique regional abundance of metals, the study uses the Mean of Surface with Non-homogeneity to demarcate the research area into smaller units and analyzes the extent to which these sub-units enhance the precision of statistical analyses. Such an analysis furnishes a new approach for the spatial prediction of soil metal content within these distinctive regions.
2. Materials and Methods
2.1. Study Area and Collect Soil Samples
The research area is situated in the southwest of China, encompassing an overall area of 18,617 km2. It features karst topography and has a population of nearly 4 million. Apart from the central urban area located on flat river terraces, the remaining approximately 2 million people live in dispersed rural areas in the mountainous hills.
This article compiles historical data from the study area, including soil sample data from two periods, 1990 and 2010 (a total of 11 sampling points), and soil samples were collected at the same points in 2020. The sampling in this study was sourced from the Special Investigation on Soil Environmental Background Values Project, the historical samplings from the local environmental protection agency, and the sampling and testing methods are consistent with the project. A total of 2588 soil samples were collected, including 290 samples from the surface and subsoil layers, comprising 580 soil samples; 150 profile sampling points were established, yielding 1897 profile soil samples. Additionally, 31 bedrock samples were collected, totaling 2619 samples, including the previous 11 sampling points (
Figure 1).
Surface and subsoil sampling points collected two layers: the A horizon (topsoil) and the C horizon (parent material layer). To ensure sample representativeness, a mixed sample-collection approach was employed on-site (within a 30-m radius of the sampling locations; similar sample points were collected and combined to form composite samples). The sampling depth for the A horizon of soil samples was generally 0 to 30 cm, while for the C horizon, it ranged from 70 to 120 cm. Profile samples were collected at depths ranging from 1 to 2 m, with sampling intervals of 10 cm within the 0–1 m depth range and 20 cm intervals within the 1–2 m depth range, collecting 10 to 15 samples per sampling point. Bedrock samples were collected for profile samples that reached the bedrock section. All samples were subjected to laboratory analysis using the same testing methods: Cr, Pb were determined by X-ray fluorescence spectrometry (XRF), Cd by inductively coupled plasma mass spectrometry (ICP-MS), and As and Hg by atomic fluorescence spectrometry (AFS).
2.2. Spatial Interpolation Analysis Based on Geostatistics
Geostatistics is a statistical method used to analyze spatial data and stochastic processes, primarily applied in fields such as Geographic Information Systems (GIS), environmental science, mining, and agriculture. Geostatistics has also developed various methods, including robust geostatistics [
22], nonparametric geostatistics [
13], multivariate geostatistics [
23], random simulation [
24], and research that combines these with modern mathematical methods [
25].
Robust geostatistics belongs to the ideal model, which requires the data set to conform to a normal distribution. This is difficult to achieve for the statistical distribution of soil heavy metal samples, as the concentrations at most sampling points tend to be very low, resulting in a high frequency of left-skewed curves. The remaining four methods are more practical. Nonparametric geostatistics processes the raw data while ignoring certain heterogeneities to analyze the overall situation. Multivariate geostatistics incorporates the analysis of additional variables, such as terrain [
26] and vegetation characteristics [
27]. Random simulation and research that combines these with modern mathematical methods ignore the process and instead work backward from the results to deduce the potential sources and migration processes of soil heavy metal content.
This study considers the large extent of the research area and focuses on determining which interpolation method yields the smallest error. During the process of selecting interpolation methods, it was found that the impact of the method on spatial interpolation outweighed that of parameter adjustments. Therefore, the parameters for the same method model were kept consistent for each metal. When choosing the interpolation method, the closer the mean error (ME) is to 0 and the smaller the root mean square error (RMSE), the better the parameter (model). Thus, based on calculating the semivariogram (a statistical tool used to describe the correlation between spatial data points, implemented through the GS+ 9.0) based on the total point content, various interpolation methods are compared. The specific operations and steps are as follows (
Figure 2):
Step 1: Randomly select 20% of the points within the research area as validation points, with the remaining 80% of points used as sample points for interpolation. Only 20% of the data points were used for validation to ensure that more than 30 points are included in the model calculations, in order to meet the minimum statistical requirements.
Step 2: For each of the five metals, run Kriging, IDW (Inverse Distance Weighting model), Spline, and Natural Neighbor models separately. Compare the interpolation results with the validation points, calculate the difference, and obtain the average difference. Finally, select the most suitable interpolation model for the five metals.
Step 3: Repeat steps 1 and 2 50 times. Choose the mode of occurrence of the model with the lowest error among the five metals as the final interpolation model. Considering the runtime of the model, the optimal difference method for each metal is essentially determined and unlikely to change after 50 iterations.
The above procedures were implemented using ModelBuilder and Python in ArcGIS Pro 3.0.
2.3. Analysis of Cumulative Risks
This method compares the pollutant content in surface soil with that in deeper soil at the same sampling point, or compares the pollutant content in surface soil of a region with historical pollutant levels in the surface soil of the same region, to assess the accumulation of pollutants in surface soil [
28]. The single-factor enrichment coefficient method is employed, with the calculation formula being:
where Ai represents the single-factor enrichment coefficient of pollutant i in the soil, Ci denotes the total measured value of pollutant i in the surface soil, with units consistent with Bi, and Bi indicates the total measured value of pollutant i in the deeper soil layer [
29]. Based on the magnitude of Ai, a single pollutant accumulation assessment is conducted for soil survey sites (
Table 1).
2.4. Mean of Surface with Nonhomogeneity (MSN Model)
The principle of the MSN method is to estimate the mean and variance of specific attributes, combining the advantages of spatial stratification of cations and kriging variance [
30,
31]. It assumes that a non-uniform surface can be transformed into several smaller uniform surfaces through stratification, which is possible in most cases [
32]. The estimation results of MSN have been proven to be the best linear unbiased estimators of the true values [
33]. In practice, meteorological networks often cover very large areas, making it difficult to ensure that the distribution of a meteorological element follows the assumption of homogeneity across the entire region. Spatial sampling optimization methods based on MSN theory will make global estimation and statistical inference of complex regions more effective and reliable.
This study utilizes the MSN principle and GIS overlay to divide units, where the interpolation results of surface contents for each metal are considered as layers of factors. Within each layer, differentiation into several smaller uniform surfaces is conducted based on color bands. The boundaries of the units follow the boundaries of townships. After unit division, a comparative analysis based on the variance of content at each unit point is performed. The main steps are as follows:
Step 1: Classification is conducted based on spatial interpolation results of surface content for five metals, serving as the basis for unit differentiation according to factors of variation. Adjustments were made in conjunction with administrative boundaries to ensure that a single risk could be addressed within one administrative region. This study utilized township boundaries, which can enhance the precision of environmental management.
Step 2: In accordance with the interpolation method described in
Section 2.2, GB15618-2018 (Soil environmental quality Risk control standard for soil contamination of agricultural land, it stipulates the risk screening values and control values for soil pollution in agricultural land in China.) was selected as the basis for classification. Shades of green indicate values below the metal screening threshold, shades of yellow represent values between the screening threshold and the regulatory threshold, and shades of red signify values exceeding the regulatory threshold.
Step 3: Different color schemes are regarded as a mean of surface with nonhomogeneity. Consequently, the division of units is primarily guided by the boundaries of the color schemes, with adjustments made according to the boundaries of townships. The interpolation results for Cd are considered as the primary factor due to its highest pollution risk. Adjustments are made to the boundaries and units sequentially considering the Hg, As, Pb, and Cr interpolation results. The fine-tuning of boundaries is more of a qualitative process, which can be informed by the researcher’s experience and understanding of regional environmental management. This approach facilitates the better application of the research findings.
3. Results
3.1. Temporal Changes in Same Samplings Content
This paper analyzes the characteristics of soil metal content changes in the study area over a period of 30 years (
Figure 3). Among them, the data for 1990, 2010, and 2020 (the first three sets of bar charts) represent data from 11 identical sampling points, while the 2020a data represents all data collected in this study (580 sampling points). By comparing the mean contents and 95% confidence intervals of different metals over these 30 years, it can be observed that the mean Cd content has significantly increased. The proportion of increase in Cd content at the same sampling points over the 30 years was 378%, compared to a 326% increase in the overall level. This indicates a significant increase in Cd content. At the same time, its standard deviation has also significantly widened, leading to a significant increase in its confidence level beyond the average growth. This indicates a significant increase in the variability of data from the same sampling points. Through GIS analysis, it was found that four sampling points closer to human activities and industrial production showed a significant increase in soil Cd content.
The variation in Hg content remains relatively stable, with a decrease of 29% in the 2020a data compared to the 1990 data and an increase of 39% compared to the 2020 data, indicating an overall stability in soil Hg content across the entire area, but with a significant increase in local variability. As, Pb, and Cr show a trend of initial increase followed by a decrease at the same sampling points. As it has remained relatively stable in the first 20 years, considering the significant differences between the previous sampling and testing errors and the 2020 data, the source of the initial change is difficult to determine. The 2020 and 2020a data show decreases of 24% and 31%, respectively; spatial analysis based on GIS also does not reveal any significant changes. However, in the 2020a data, their variance and standard deviation are consistent with those of 1990 and 2010, indicating that the variation in their differences may be due to spatial variability rather than a general trend. Pb and Cr show consistent changes, with increases of 72% and 52%, respectively, from 1990 to 2010, and decreases of 9% and 12%, respectively, from 2010 to 2020; the overall situation in 2020a was essentially consistent with that of 2010. Wang [
34] also observed similar changes in the Yangtze River.
3.2. Profile Samplings Content
This article uses different color schemes to illustrate the varying risks of metal content, in comparison with China’s “Soil Environmental Quality Risk Control Standard for Soil Pollution on Agricultural Land GB15618-2018”. The green zone refers to areas with very low levels of metal content, which do not have a significant impact on ecosystems and various flora and fauna, and where the environmental risk of metals in the soil is relatively low. The yellow zone is an area where the metal content may affect vegetation production and human health. The red soil series has relatively high metal content, posing higher environmental risks to the ecosystem, and to some extent, affecting the regions where major crops are produced. Using different color schemes is to highlight the differences between areas more clearly, providing basic data for the superposition of multiple metals in the MSN model. The depth selection for each metal may vary, which is related to the soil layer depth at the samplings and the characteristics of the corresponding metal content variations; only the observed changes in content are presented.
3.2.1. Cadmium Content
The optimal spatial interpolation method for Cd based on geostatistics is Kriging, with a model accuracy of up to 67%. According to the content of Cd in soil samples from different profiles, there are significant spatial distribution differences in the content of Cd at different depths. The surface soil in the central and southwestern parts of the study area is mainly red and yellow, while in the northern and southeastern parts, it is a mixture of yellow and green. According to the interpolation principle of the Kriging model, the outliers in the region will be ignored to some extent, resulting in a certain degree of spatial clustering. The areas where different color series coexist indicate that the Cd content was distributed around the standard value, suggesting that the influence of Cd at a smaller scale was higher than the comprehensive impact at a larger scale.
The spatial distribution pattern of Cd content in the 2–12 layers was basically consistent with that in the surface layer, and the metal content was much lower than that in the surface soil. In the northern and southeastern parts, it decreases to the green range, while in the central and southwestern parts, it decreases to the yellow and orange ranges, ranging from 0.2 to 0.8 mg/kg (
Figure 4).
In order to compare the differences between layers, the cumulative degree was compared between the first layer (0–20 cm, median 1.5 mg/kg) and the eighth layer (140–160 cm, median 0.17 mg/kg). It was found that the proportions of slight accumulation, moderate accumulation, and severe accumulation were 33.4%, 8.0%, and 1.1%, respectively. The proportion of points with no accumulation was 57.6%, indicating that the Cd content in deep soil was generally lower than that in the surface layer. The points with high surface and low deep Cd content are relatively few, which was due to the low proportion of Cd in the soil parent material. The Cd in the surface layer was increased by external sources, which was consistent with the increase in Cd content over 30 years. In regions with strong human activities, the Cd content in soil had increased significantly.
3.2.2. Mercury Content
The optimal spatial interpolation method for Hg based on geostatistics is IDW (Inverse Distance Weighting model), with a model accuracy of up to 72%, which was generally associated with low Hg content. The Hg content in the research area was generally below 0.3 mg/kg, with most of the Hg content falling within the range of no environmental risk, except for a few individual points. The northern part of the area had a content ranging from 0.2 to 0.3 mg/kg, while the southeastern part was below 0.1 mg/kg, and the rest of the area was mostly between 0.1 and 0.2 mg/kg (
Figure 5).
By calculating the accumulation of various soil layers, it can be seen that the proportion of sites with no obvious accumulation was 92.3%, while sites with mild accumulation account for 3%. This indicates that the Hg content between the surface and deep soil layers was basically at the same level, with only some sites showing mild accumulation. This to some extent suggests that the Hg is derived from pedogenesis, and the increase in Hg content caused by human activities has little impact on the overall area of the study area.
3.2.3. Arsenic Content
The optimal spatial interpolation method for As based on geostatistics is Kriging, and the accuracy of the model can reach up to 66%. As exhibits completely different spatial characteristics from other metals, showing obvious surface spatial aggregation in the southwest corner of the study area and significant aggregation at depths of 80–180 cm (median 46 mg/kg) in the northeastern part of the study area, indicating a continuous high content presence (
Figure 6).
Combining the analysis of multiple time points, it can be seen that the average As content decreased by 24% over 30 years (pentagram points in the figure). Two points are located in the red and yellow areas. Combined with the results of As accumulation analysis, it was known that non-accumulated points account for 95.0%, mild accumulation accounts for 4.3%, and moderate accumulation accounts for 0.7%. This indicates that the As content in each soil layer was relatively consistent, but there are still some areas where the surface content was high and the deep content was low. For example, in the northwest corner of the study area, the content was basically normal at 240 cm, while the high-value area slightly east of the center only exists in the 80–180 cm soil layer. As content tends to increase as you go deeper in some areas, mainly concentrated in the green areas in the central and southern parts of the study area, which is mainly influenced by geological factors.
3.2.4. Lead Content
The optimal spatial interpolation method for Pb based on geostatistics is Kriging, with a model accuracy of up to 70%. The Pb content in the entire study area was generally below 100 mg/kg (green range), with only some areas in the northeast exceeding 240 mg/kg. Apart from the blue circle, the Pb content poses little environmental risk, but there are points within the blue circle with Pb content exceeding 100 mg/kg, and the overall Pb content within the circle was generally high (
Figure 7).
Compared to the surface sampling points, the Pb content at various profile points has relatively lower and more consistent environmental risks, especially in the areas within the blue circle turning green. When comparing with the spatial plane, only the Pb content in the middle eastern and southwestern areas was relatively higher. Calculations on accumulation show that there was no significant accumulation at 93.4% of the points, slight accumulation at 5.9%, moderate accumulation at 0.5%, and heavy accumulation at 0.2%. Points with higher Pb content in deeper layers are mainly concentrated in the northeastern, middle eastern, and southwestern regions, while the Pb content between the surface and deep soil layers was generally at the same level.
3.2.5. Chromium Content
The optimal spatial interpolation method for Cr based on geostatistics is Kriging, with a model accuracy of up to 69%. The spatial distribution characteristics of Pb are quite similar, with higher concentrations in the surface soil mainly concentrated in the central region (median 210 mg/kg) of the northwest area, migrating about 50 km westward compared to Pb (
Figure 8).
The coefficient of variation of Cr content in the surface soil was 3.93, with a mild accumulation rate of 3% and no accumulation rate of 97%. Spatial interpolation of soil profiles from layers two to 12 reveals that there are abnormal phenomena at some locations. In the 60–80 cm depth (fourth layer median 24 mg/kg), there are points with Cr content exceeding 300 mg/kg in the blue circle area, which was similar to the characteristics of As. In some areas, the deep Cr content exceeds that of the surface points, indicating enrichment of Cd content in the deep soil layers due to certain factors in the soil-formation process. The surface soil Pb in these areas originates from the deep layers.
3.3. Units Division Based on MSN
This study employs MSN model principle and GIS overlay to divide the units. The interpolated surface content of each metal is treated as a layer of factors. Within each layer, several small, uniform surfaces are differentiated based on the color band division described in
Section 3.3. The boundaries of the units follow township boundaries. After unit division, a comparative analysis is conducted based on the variance of the content of each unit point. The main steps are as follows: Classify the spatial interpolation results of the surface content of the five metals as the basis for unit differentiation, focusing on three main color series: green, yellow, and red. For Cd, the spatial interpolation results are used to divide the units based on township boundaries. On the basis of Cd division, consider the spatial interpolation results of Hg to divide the units along township boundaries. Successively consider the adjustment of As, Pb, and Cr to the boundaries and units. The specific division is shown in
Figure 9 below.
Through the temporal statistical analysis of horizontal-vertical soil metal content, it was evident that the overall statistical and fitting analysis within the research area was moderate. The R2 values of the statistical interpolation fitting models for each metal range only between 0.149 and 0.545. Combining the spatial profile analysis of each metal, it was observed that Cd content was higher in the central and southwestern parts, As content was higher in the northwest and central eastern parts, Pb content was higher in the northeast, and Cr content was higher in the northwest and southwest. Therefore, after multiple layers of overlay, the research area can be roughly divided into several regions: northwest, northeast, central eastern, central, southwest, and southeast.
5. Conclusions
The current research provides a new comprehensive risk assessment method for spatial simulation of heavy metal content in areas with high heterogeneity in heavy soil metal levels. At the same time, it divides the study area into units based on the statistical differences in soil heavy metals, thereby improving the methods for simulating soil metal content and conducting risk assessments in the region. Over the past 30 years, the average Cd content in the study area has significantly increased, while the Hg content has remained relatively stable. As, Pb, and Cr have shown a trend of first increasing and then decreasing at the same sampling points. Verification through soil profiles indicates that the Cd content was consistent between the deep and surface layers in the Northwest Mountainous Area, Southwest Karst Area, and Central Karst Area, displaying significant evolutionary characteristics. The As content in the Northwest Mountainous Area also exhibits the same characteristics, while the other metals do not follow these patterns. Analysis of rock samples from the deeper soil points shows the same patterns for metal characteristics, indicating that geological factors are the primary source of the elevated soil metal content in these regions. The coefficient of variation in each sub-region based on MSN partitioning was significantly lower than that of the entire study area, which was related to the differences in dominant factors of soil heavy metal content in each region. The research results can provide a new method for precise risk assessment of soil heavy metal content in areas with abnormal metal content such as karst soils.