Next Article in Journal
Assessment and Remediation of Heavy Metal Contamination in Soil
Previous Article in Journal
Occurrence, Bioaccumulation and Dietary Exposure Assessment of Legacy and Emerging Per- and Polyfluoroalkyl Substances (PFAS) in Freshwater Fish from Zhejiang Markets: Implications for Human Health Risks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Unraveling the Origins and Drivers of Potentially Toxic Elements (PTEs): A Sequential Framework Integrating Receptor Model and Machine Learning

1
Key Laboratory of Coupling Process and Effect of Natural Resources Elements, Beijing 100055, China
2
Key Laboratory of Gold Mineralization Processes and Resource Utilization, MNR, Key Laboratory of Metallogenic Geological Process and Resource Utilization of Shandong Provincial, Shandong Institute of Geological Sciences, Jinan 250013, China
3
Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
*
Authors to whom correspondence should be addressed.
Toxics 2026, 14(6), 525; https://doi.org/10.3390/toxics14060525
Submission received: 15 May 2026 / Revised: 13 June 2026 / Accepted: 15 June 2026 / Published: 17 June 2026
(This article belongs to the Section Metals and Radioactive Substances)

Abstract

Source apportionment and the elucidation of driving mechanisms are essential for targeted soil pollution management. This study investigated surface soils across six towns in southern Shimen County, northwestern Hunan Province, where 662 samples were collected to determine the concentrations of As, Cd, Cr, Cu, Ni, Pb, and Zn. Multivariate statistics and the APCS-MLR receptor model were integrated to quantify pollution sources, while three machine learning models (RF, XGBoost, and LightGBM) were applied to identify key drivers of the spatial enrichment of Cd. Results showed that Cd was significantly enriched, with a mean concentration of 0.43 mg/kg (3.41 times the provincial background value). The mean concentrations of As, Cr, Cu, Ni, Pb and Zn were 11.97 mg/kg, 81.01 mg/kg, 24.15 mg/kg, 49.25 mg/kg, 29.56 mg/kg and 76.77 mg/kg, respectively, and these PTEs remained at normal background levels. Significant inter-element correlations indicated common sources. Three primary sources were quantified—natural parent material (43.83%), mining activities (30.99%), and mixed sources of coal mining and agricultural inputs (7.84%), with 17.34% attributed to unidentified mixed sources. Natural sources dominated the geogenic enrichment of Cd, Cu, Ni, Pb, and Zn; mining activities governed the accumulation of As, Cr, Cu, and Pb; a mixed source of coal mining and agricultural practices contributed substantially to Cd enrichment. Machine learning identified PM10, topography, strata, and soil type as dominant drivers, with their total feature importance reaching 70.05%. Among these factors, natural factors and anthropogenic factors accounted for 44.23% and 55.77% of the total feature importance, in turn revealing coupled natural–anthropogenic controls. This study establishes an integrated framework linking source apportionment and driver identification, providing scientific insights for potentially toxic elements (PTEs) control in analogous mining–agricultural regions.

Graphical Abstract

1. Introduction

With the rapid development of the economy and the continuous advancement of industrialization and agricultural intensification, the issue of soil PTE pollution has become increasingly severe, emerging as a global ecological and environmental concern [1,2,3]. Soil PTEs are characterized by their persistence, tendency to accumulate, and strong concealment, and they can enter the human body through pathways such as the food chain, posing significant risks to human health [4,5]. Hunan Province, known as China’s “hometown of non-ferrous metals,” experiences intensive human activities such as industrial and mining operations and agricultural production, making it a key region for PTE accumulation and pollution control [6,7]. Against this background, accurately identifying the sources of regional soil PTE pollution and elucidating their driving mechanisms holds important theoretical value for fundamental research on soil PTE pollution and provides practical guidance for developing targeted pollution control and remediation strategies.
Source apportionment and causal identification of PTE pollution serve as important prerequisites for regional environmental risk management. The sources of PTEs in soil are complex, influenced by both natural factors and anthropogenic activities [8]. Multivariate statistical analysis, with its strengths in dimensionality reduction, information condensation, and clustering, enables the extraction of latent pollution patterns and key variation features from multidimensional environmental datasets. By mitigating analytical biases introduced by multicollinearity, it is a well-established method for deconstructing pollution characteristics and identifying source factors [9,10]. Receptor models do not rely on detailed emission source inventories but instead use measured environmental receptor data to identify pollution source types and quantitatively apportion source contributions. They have been widely applied to the source apportionment of PTEs in soil, sediments, and atmospheric particulates [11,12]. Among these, the APCS-MLR model integrates the technical advantages of principal component analysis and multiple linear regression. By using absolute principal component scores to effectively correct the issue of negative factor scores, it can both qualitatively identify the types of potential pollution sources in the study area and quantitatively estimate the contribution shares of various sources. The model has clear physical meaning and strong interpretability, making it suitable for source tracing and quantitative contribution analysis of regional PTE pollution [13,14].
The spatial enrichment and differentiation of PTEs in soil are governed by the interplay of natural background, geographical conditions, and anthropogenic activities, forming complex nonlinear relationships with environmental factors and soil physicochemical properties [15,16]. While receptor models can quantify potential pollution sources, they often fail to capture underlying mechanisms and objectively assess the relative contributions of specific drivers. Machine learning, with its robust nonlinear fitting capability and nonparametric flexibility, has emerged as a powerful tool for disentangling these complex relationships and has become the prevailing method for identifying dominant factors controlling PTE dynamics [17,18].
Accordingly, this study focuses on PTEs in southern Shimen County, Hunan Province. It is hypothesized that PTE distribution is jointly controlled by multiple pollution sources and environmental factors. Multivariate statistical analysis and the APCS-MLR receptor model were first employed to qualitatively identify and quantitatively apportion potential pollution sources. Furthermore, we aim to clarify the key factors dominating spatial variations of PTEs, so three integrated tree models—Random Forest, XGBoost, and LightGBM—were adopted to systematically analyze the nonlinear driving effects and dominant contribution characteristics of environmental factors on the spatial distribution of PTEs in soil. The integrated framework coupling pollution source apportionment and driving mechanism analysis can provide a reference research paradigm for tracing PTE sources and exploring accumulation mechanisms in complex natural–anthropogenic affected areas. Meanwhile, it offers scientific data support and a theoretical basis for targeted prevention and control, ecological risk management, and comprehensive remediation of regional soil pollution.

2. Materials and Methods

2.1. Study Area, Sampling, and Chemical Analysis

The study area covers six townships in southern Shimen County, Hunan Province, China. It falls within the transitional monsoon climate zone between the mid-subtropical and typical subtropical regions, with a mean annual temperature of 18.4 °C and an average annual precipitation of 1390.3 mm. Field sampling was conducted in 2015, yielding a total of 662 surficial samples (0–20 cm) (Figure 1). The geographic coordinates of each site were recorded using a portable GPS device. All collected soil samples were sealed in plastic bags and promptly transported to the laboratory for subsequent pretreatment and analysis. Prior to testing, the soil samples were air-dried at ambient temperature, ground into fine particles, and sieved through a 200-mesh nylon sieve with a pore size of 0.149 mm. Approximately 0.2 g of soil was weighed and digested with HNO3–H2O2 mixture following USEPA Method 3050B [19]. All reagents were guaranteed reagent (GR) grade. The resultant digest was cooled, filtered, and adjusted to 50 mL with ultrapure water before being analyzed by instruments. Arsenic (As) was determined using atomic fluorescence spectrometry (Beijing Kechuang Haiguang Instrument Co., Ltd., Beijing, China). Cadmium (Cd), chromium (Cr), copper (Cu), nickel (Ni), lead (Pb), and zinc (Zn) were measured using inductively coupled plasma mass spectrometry (ELAN DRC-e ICP-MS, PerkinElmer, Waltham, MA, USA). National standard reference materials (GSS-1, GSS-4) were used for analytical quality control throughout the experiment. In addition, blank samples and duplicate samples (one duplicate per 20 samples) were routinely arranged. The standard reference materials and field samples adopted the same pretreatment and determination procedures. The recoveries of the certified reference materials GSS-1 and GSS-4 were in the range of 90–110%, meeting the requirements for analytical quality control.

2.2. Influencing Factors

PTEs in Soil are affected by the combined effects of natural and anthropogenic factors [20]. In this study, 10 potential driving factors influencing PTEs in soil were selected for analysis. Natural factors include strata, soil type, topography, and slope; anthropogenic factors include land use type, GDP, population density, atmospheric deposition, distance to roads, and distance to rivers. Notably, soil types in this study were classified based on the Genetic Soil Classification of China (GSCC), the official system used for the Second National Soil Survey of China [21]. The data sources and spatial distribution of these driving factors are shown in Table S1 and Figure S1.
Due to data availability, this study collected the main driving factors affecting PTEs in soil as comprehensively as possible, all of which have been proven to exert significant influences on PTE concentrations. Differences in the weathering of strata and lithology determine the initial background concentrations of PTEs in soil and form the fundamental natural basis for their spatial distribution. Differences in physicochemical properties among various soil types directly affect the migration and transformation processes of PTEs and alter their enrichment and differentiation characteristics. Topography, landforms and slope regulate surface material migration and slope confluence processes, thereby reshaping the spatial distribution patterns of PTEs. Land use type reflects the differences in anthropogenic disturbance and agricultural input intensity, and directly affects the exogenous accumulation of PTEs. GDP and population density represent the intensity of regional economic development and human activity disturbance, respectively; higher values indicate more pronounced superimposed impacts of anthropogenic pollution. Atmospheric dust deposition is an important exogenous input pathway for PTEs, and continuous deposition aggravates PTE enrichment in soil. Meanwhile, traffic pollutant emissions and river irrigation inputs also play critical driving roles in the accumulation and spatial differentiation of PTEs.

2.3. Methods

In this study, multivariate statistical analysis was used to identify potential sources of PTEs, and the APCA-MLR model was applied to quantify the contribution rate of each source. Meanwhile, three widely used machine learning algorithms were adopted to determine the key driving factors of Cd in the study area. The detailed methods are described as follows:

2.3.1. Multivariate Statistical Analysis

Correlation analysis and principal component analysis (PCA) are two of the most classical and widely used methods in multivariate statistical analysis. They can effectively explore the intrinsic relationships among elements and identify pollution source characteristics, and have therefore been extensively applied to the source apportionment of PTEs in soil [22,23]. Correlation analysis can quantitatively characterize the correlation degree between PTEs. Generally, a larger absolute value of the correlation coefficient indicates a more significant linear correlation, implying similar geochemical behaviors and a higher probability of being controlled by the same geological genesis or anthropogenic activities, which provides a reliable basis for preliminary discrimination of the homology of PTEs [24]. PCA realizes dimensionality reduction and information compression of multidimensional data. On the premise of retaining the maximum original information, it transforms numerous highly correlated PTEs into a small number of independent comprehensive principal components. According to the loading characteristics of each element on the principal components, potential pollution sources such as natural parent material, mining activities, agricultural fertilization and pesticide application can be objectively distinguished. Hence, PCA is an important tool for the qualitative source identification of PTEs in soil [24].

2.3.2. Absolute Principal Component Score-Multiple Linear Regression (APCS-MLR)

APCS-MLR is a receptor model combining principal component analysis (PCA) with multiple linear regression. Its basic assumption is that the pollutant concentration at the receptor equals the sum of contributions from various pollution sources [25]. The calculation procedure is as follows: potential pollution sources affecting PTEs are first identified using PCA. The extracted principal components, representing potential sources, are then linearly regressed against PTE concentrations. The regression coefficients are then used to calculate the contribution rate of each pollution source to PTE enrichment [26]. This method does not require prior knowledge of the number and types of pollution sources; potential sources can be inferred merely based on pollutant information at the receptor site. With the advantages of simple operation and relatively objective and reliable results, APCS-MLR has been widely adopted in relevant studies [27,28]. The basic algorithm of APCS-MLR is expressed as follows:
Z ik = X i k C i σ i
Z ik = j = 1 p W i j P j k
( A P C S ) j k = P j 0 P j k
X i k = A i 0 + j = 1 p A i j ( A P C S ) j k
In the above equations, Xik represents the concentration of element i at sampling site k; Ci is the mean concentration of element i; σi is the standard deviation; Zik is the standardized matrix; j denotes the number of factors; Wij is the factor loading matrix; Pjk is the factor score matrix, indicating the score of factor j at site k; Pj0 is the factor score at the zero-pollution point, where the concentrations of all PTEs are zero, and its introduction aims to calculate the absolute principal component score (APCS)jk; Ai0 is the intercept term in the regression equation, and Aij is the linear regression coefficient of the j-th factor for element i. Finally, the main pollution sources are identified based on Wij, and the contribution ratios of different pollution sources are calculated using (APCS)jk and Aij.

2.3.3. Machine Learning Methods

Three machine learning models, namely Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), were adopted to analyze the driving factors of soil cadmium. All three models exhibit excellent nonlinear fitting capability, can effectively capture the complex relationships between PTEs and multisource influencing factors, and can output feature importance, which makes them suitable for identifying key driving factors.
(1)
RF
Random Forest (RF) is an ensemble decision tree model based on the Bagging strategy. It constructs multiple independent decision trees through bootstrap sampling and random feature selection, and finally outputs the regression result by averaging the predicted values of all trees. The model has strong anti-overfitting capability and high stability, and can directly provide the importance score of each factor. It has become a commonly used method for data-driven analysis in environmental research [29].
(2)
XGBoost
XGBoost is a gradient boosting algorithm under the Boosting framework. It improves prediction accuracy by iteratively fitting residuals in a serial manner and introduces regularization terms to control model complexity. With high prediction accuracy and strong robustness, it can explore the nonlinear relationships between influencing factors and PTEs concentrations, and is well applicable to the modeling of high-dimensional environmental variables [30].
(3)
LightGBM
LightGBM is an efficient optimized version of the traditional gradient boosting decision tree. It adopts Gradient-based One-Side Sampling and Exclusive Feature Bundling strategies to improve computational efficiency, and uses a leaf-wise growth strategy to enhance fitting performance. The model features fast computation, low memory consumption, and good adaptability to large datasets. It improves modeling efficiency while maintaining prediction accuracy, and is suitable for analyzing the driving mechanisms of multi-factor interactions [31].

2.3.4. Model Evaluation

In the machine learning-based analysis, all samples were randomly split into a training set (70%) and a test set (30%). The training set was used for model training, while the test set was used to independently evaluate the generalization and prediction performance of the model. To avoid overfitting and improve model stability, 10-fold cross-validation was performed only within the training set. The training set was evenly divided into ten subsets; in each iteration, nine subsets were used as the sub-training set, and one as the validation set, and the procedure was repeated ten times to complete model optimization. Three statistical indicators, including mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R2), were employed to quantitatively evaluate the model prediction accuracy. In general, smaller values of MAE and RMSE and an R2 value closer to 1 indicate better model performance [32]. The used evaluation metrics are expressed by the following formulas, where yp denotes the predicted value, yo denotes the observed (true) value, yp denotes the mean of the predicted values, and N denotes the total number of data points.
M A E = i = 1 N y p y o N
R M S E = 1 N i = 1 N ( y p y o ) 2
R 2 = 1 i = 1 N ( y o y p ) 2 i = 1 N ( y o y , p ) 2

3. Results and Discussion

3.1. Basic Characteristics of PTE Concentrations

3.1.1. Descriptive Statistical Analysis

As shown in Table 1, compared with the soil background values of Hunan Province, only Cd, Cr, and Ni exhibited enrichment characteristics in the study area, while the remaining PTEs showed no significant accumulation trend. The average contents of As, Cu, and Zn were 11.97 mg/kg, 24.15 mg/kg, and 76.77 mg/kg, respectively, all lower than the corresponding soil background values of Hunan Province. The average Pb content was 29.56 mg/kg, which was basically consistent with the regional background value of 29.70 mg/kg. The average concentrations of Cr and Ni were 81.01 mg/kg and 49.25 mg/kg, 1.13 and 1.54 times the background values, indicating slight accumulation to varying degrees. The average Cd content reached 0.43 mg/kg, 3.41 times the soil background value of Hunan Province. In addition, the average concentration of Cd was higher than the corresponding risk screening value of China (0.4 mg/kg), implying existing environmental risk. Cd was identified as the heavy metal with the most prominent enrichment degree in the study area.

3.1.2. Spatial Distribution Patterns of PTEs in Soil

Inverse distance weighting (IDW) spatial interpolation was applied to systematically analyze the spatial distribution patterns of seven PTEs in the study area (Figure 2). The results indicated that all PTEs exhibited significant spatial differentiation characteristics: High arsenic (As) concentrations were mainly concentrated in Xinguan Town in the northwestern part of the study area, with scattered local high-value patches also observed in Mengquan Town and Jiashan Town. Overall, As levels were generally low across the entire study area, and the extent of areas exceeding the regional soil background value was limited. Cadmium (Cd) showed the most prominent spatial enrichment characteristics. The core high-concentration zone was located in the central part of the study area, concentrated in the northern and western parts of Jiashan Town. Meanwhile, certain Cd high-value areas also developed in some local zones in the northern part of the study area, presenting an obvious spatial agglomeration effect. Chromium (Cr), copper (Cu), nickel (Ni), lead (Pb), and zinc (Zn) showed strong spatial distribution similarity. Their high-concentration zones were mainly contiguous in the northern and central parts of the study area, with notable spatial continuity; meanwhile, some scattered high-value patches were also observed in local areas of the southern part.

3.2. Results of Multivariate Statistical Analysis

3.2.1. Correlation Analysis (CA)

The correlation matrix helps explore relationships among variables by revealing the overall coherence of the dataset (Figure 3). Significant correlations were observed among most PTEs, indicating potential common pollution sources [35]. Cd had no significant correlation with As, but was moderately correlated with other PTEs, with correlation coefficients ranging from 0.15 to 0.35. In contrast, As was correlated with all PTEs except Cd, with correlation coefficients higher than 0.4, suggesting similar source contributions among these elements. Cr, Cu, Ni, Pb, and Zn exhibited strong intercorrelations, implying highly homologous pollution sources, with their correlation coefficients generally above 0.6. Notably, the correlation coefficient between Cu and Zn reached 0.88, revealing an extremely high similarity in their pollution sources.

3.2.2. Principal Component Analysis (PCA)

PCA was applied to elucidate the origins of PTEs by reducing the original dataset to several dominant factors [10]. The data were assessed by KMO and Bartlett’s test. Results showed that the KMO value was 0.843 (>0.5) and Bartlett’s test was significant (<0.05), which indicated that the data was suitable for PCA.
Principal component analysis (PCA) was performed on seven PTEs, and a total of three principal components were extracted. These three principal components collectively explained 80.675% of the total data variance. The variance contribution rate of the first principal component was 35.554%, that of the second principal component was 25.574%, and that of the third principal component was 19.546%. As shown in Table 2, three principal components were identified in total. The first principal component had the highest loadings on Cu, Ni, and Zn; the second principal component exhibited relatively high loadings on As, Cr, and Pb; the third principal component showed the maximum loading on Cd, and also exerted a certain influence on Cr and Ni. Cluster analysis was also conducted (Figure S2). Results showed that there were three clusters—(1) Cr–Cu–Ni–Zn; (2) As, Pb; and (3) Cd—which was generally consistent with the PCA results.

3.3. Source Apportionment of PTEs in Soil

Quantitative source apportionment of PTE pollution sources in the study area was conducted based on the APCS-MLR model (Figure 4). Source 1 contributed the largest proportion to Cu, Ni, and Zn, with contribution rates all exceeding 50%; particularly, its contribution rate to Zn reached as high as 69%. In addition, Source 1 also showed moderate contributions to Cd and Pb, with contribution proportions of 26.68% and 38.22%, respectively. In combination with the average concentrations of PTEs in the study area, the contents of most PTEs were close to the background values of Hunan Province, with no obvious accumulation. The natural background levels of PTEs in soil are primarily inherited from parent materials of soil formation, and elements released during rock weathering enter the soil, serving as an important potential source of PTEs in soil [36]. In this study, Source 1 presented a certain contribution to almost all PTEs, and the concentrations of most PTEs were close to the local background values. Therefore, Source 1 was identified as the natural source, with an average contribution rate of 43.83%.
Source 2 exhibited the highest contribution proportion to As at 73.72%, and also made considerable contributions to Cr (49.05%), Cu (31.41%), and Pb (52.87%). According to the spatial distribution characteristics of As, its high-value areas were mainly concentrated in Xinguan Town in the northern part of the study area. Based on the mineral distribution map of the study area (Figure 5), mineral resources are concentrated in the northern part of Xinguan Town, with limestone, iron ore, and stone coal as the dominant types. According to this spatial coupling relationship, it can be inferred that the high-value zones of PTEs in this area are likely closely associated with local mining activities. Previous studies have indicated that the study area is affected by various mining activities, including stone coal mining, hematite mining, and limestone mining [37]. Stone coal mining can elevate heavy metal concentrations in surrounding soils. A previous investigation on soils around stone coal mines in the lower reaches of the Zijiang River, Hunan Province, revealed that the contents of As, Cd, Cr, Cu, Ni, Pb, Zn, and Hg in the surrounding soils all exceeded the soil background values of Hunan Province. Areas with high PTE concentrations were mainly distributed in the central part of the study area, close to the concentrated distribution zone of stone coal mines [38]. After hematite mining, potentially toxic elements in mine soils may pose severe ecological risks to the surrounding areas. For instance, a study focusing on a typical abandoned open-pit iron mine along the Yangtze River found that combined pollution of potentially toxic elements posed significant carcinogenic and non-carcinogenic health risks to children [39]. Meanwhile, limestone is rich in carbonate minerals and tends to cause the enrichment of PTEs [40]. In summary, Source 2 was identified as mining activities, with an average contribution rate of 30.99%.
Source 3 contributed predominantly to Cd with a contribution rate of 35.65%, while its contributions to other PTEs were relatively low. Shimen County is one of the major citrus-producing areas in Hunan Province and even nationwide. The overlay analysis of Cd spatial distribution and land use types showed that several high-value areas of Cd were distributed in orchard zones. These regions feature intensive agricultural activities, which exert a remarkable influence on the accumulation of PTEs and serve as an important anthropogenic source of PTE pollution [41,42]. Pesticides and chemical fertilizers are inevitably applied in orchards. Numerous studies have confirmed that the application of agrochemicals and fertilizers is a critical pathway for PTE accumulation in soil [43,44]. In particular, phosphate fertilizers generally contain high concentrations of Cd because cadmium naturally exists in phosphate rock raw materials; long-term application can result in severe Cd accumulation in soil [45]. Furthermore, Cd hotspots were highly consistent with the distribution of coal mines (Figure 5), indicating that coal mining activities contribute significantly to Cd accumulation. Previous studies have also confirmed that coal mining and processing can promote the release and enrichment of Cd [46]. Accordingly, Source 3 was identified as a mixed source dominated by coal mining and agricultural activities, with an average contribution rate of 7.84% to PTE accumulation in the study area.
In addition, the intercept in the APCS-MLR model represents unidentified pollution sources [11,47]. This indicates that certain pollution sources remain unrecognized in the model. Such unidentified sources also exert considerable influence on PTEs, including As, Cd, Cr, Ni, and Zn, and play an important role in the accumulation of these elements, with an average contribution rate of 17.34%. Further research combining more analytical methods is required to achieve more precise apportionment of these unknown pollution sources in subsequent studies.

3.4. Identifying Driving Factors of Cadmium Using Machine Learning

The APCS-MLR results clarified source contributions, but the spatial distribution of Cd was further shaped by multi-factor interactions. Machine learning was therefore employed to unravel dominant drivers of Cd enrichment. Three tree-based models, namely Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), were adopted to simulate PTE concentrations in the study area. The comparison results of model performance are presented in Table 3. All three models showed a moderate overall prediction performance. Among them, the Random Forest (RF) model achieved the best performance, with a coefficient of determination (R2) of 0.28 on the validation set, while the MAE and RMSE were 0.17 and 0.25, respectively. The feature importance derived from a single training–test split is susceptible to sample randomness and may lead to biased results. Therefore, instead of adopting the outcome of a single data split, this study applied 10-fold cross-validation to repeatedly calculate the importance of each driving factor and took the average value. The results showed that the ranking of factor importance across all rounds of cross-validation was highly consistent, indicating that the identified key driving factors were not affected by random sample partitioning. The ranking results exhibited good robustness and reliability.
The bar chart (Figure 6) presents the %IncMSE value of each factor derived from the Random Forest model, which reflects its contribution to model prediction accuracy [48]. The pie chart (Figure 6) illustrates the proportion of normalized relative importance of each factor. The results showed that among the ten influencing factors included in the model, PM10 (27.34%), DEM (17.22%), Strata (13.24%), and Soil type (12.25%) were the dominant factors affecting Cd content, and the cumulative relative importance of these four factors reached approximately 70.05%. This conclusion is generally consistent with the findings of previous studies based on the geographical detector model [49]. To further clarify the relative contributions of natural and anthropogenic regimes, driving factors were divided into natural and anthropogenic groups, and their relative importance values were summed. Collectively, natural factors (strata, soil type, DEM, and slope) accounted for 44.23% of the total relative importance, whereas anthropogenic factors (PM10, land use, GDP, population density, river distance, and road distance) made up the remaining 55.77%. This contribution structure was highly consistent with the APCS-MLR source apportionment results. The consistent outcomes confirm that Cd enrichment in the study area was co-dominated by natural factors and anthropogenic activities.
Atmospheric deposition is an important pathway for the exogenous input of PTEs into soil, and is closely associated with intensive human activities such as mining–smelting emissions and traffic exhaust [50]. Mining and smelting activities, fossil fuel combustion, and traffic emissions release large quantities of PTE-laden atmospheric particles. Due to their large specific surface area, these particles readily adsorb and accumulate various PTEs, and eventually diffuse and settle onto the land surface via wind transport and precipitation leaching [51,52].
Topography and landforms are key natural factors driving the spatial differentiation of PTEs in soil. Their mechanism mainly lies in regulating the migration processes of PTEs and reshaping their geographical distribution patterns [7]. Driven by hydrodynamic conditions, PTEs tend to migrate and accumulate in low-lying areas along with surface runoff, forming a distribution pattern characterized by loss at high terrain and accumulation at low terrain [53]. Meanwhile, topography indirectly affects the input and accumulation of PTEs through multiple coupling effects. For example, atmospheric dust from industrial sources is easily blocked by terrain, resulting in a significant reduction in deposition flux at high altitudes or in enclosed terrain [51].
The lithological composition and elemental geochemical background vary among different strata in the study area. The background contents of PTEs in soil are mainly inherited from soil-forming parent materials, and PTE concentrations differ significantly across various parent materials [54]. During weathering and pedogenesis of different lithostratigraphic units, differences in mineral composition and elemental concentrations are directly transferred to overlying soils, resulting in distinct variations in the initial contents and enrichment characteristics of PTEs in soils derived from different strata [55]. Therefore, stratigraphic lithology serves as a core natural driving factor controlling the spatial distribution pattern of PTEs in soil, and lithological differences constitute an important cause of the spatial heterogeneity of PTE concentrations in the study area. In this study, the Permian strata exhibited the highest average Cd concentration at 1.17 mg/kg, followed by the Triassic strata with an average Cd content of 0.66 mg/kg (Table S2, Figure S3). Previous studies have reported relatively high Cd geochemical background values in Permian and Triassic rocks [56]. Earlier research supports the present findings, confirming that Permian carbonate rocks possess inherently high Cd geochemical background levels. Under intense chemical weathering, massive formation of Fe–Mn oxides and the transformation of dissolved organic matter toward microbial-derived components produce a synergistic effect, which collectively drives substantial Cd enrichment in soils [57].
There are differences in physicochemical properties among various soil types, including soil texture, pH value, organic matter content, clay content, and cation exchange capacity (CEC). These properties collectively control the adsorption, desorption, migration, and transformation processes of PTEs in soil [58]. It has been proven that pH strongly influences Cd migration [59]. Organic matter modulates Cd adsorption via adjusting soil pH, and CEC plays an important role in this process as well [60,61]. Accordingly, the accumulation characteristics and spatial differentiation of PTEs vary markedly across different soil types. In this study, limestone soil had the highest average Cd concentration of 1.17 mg/kg, while paddy soil showed an average Cd content of 0.43 mg/kg (Table S3, Figure S4). Limestone belongs to carbonate rocks. The pronounced Cd enrichment in soils derived from carbonate rock weathering is largely related to the intensive leaching of calcium ions during weathering, resulting in a generally high geochemical background of Cd in carbonate rock areas [62]. In addition, most soils in southern China are acidic, and the pH value of paddy soils exerts a significant influence on Cd enrichment [63].
Land use type, GDP, and population density also exert important impacts on the accumulation of soil Cd in the study area. Agricultural inputs and the intensity of anthropogenic disturbance vary under different land use patterns [64]. In this study, the average Cd concentration in orchard land was 0.47 mg/kg, while both paddy field and dry land had an average Cd content of 0.43 mg/kg; all were significantly higher than that of forest land at 0.28 mg/kg (Table S4, Figure S5). Orchard land, paddy field and dry land are more intensely affected by agricultural activities such as tillage, fertilization, and pesticide application, leading to a higher degree of soil Cd enrichment. GDP and population density can effectively characterize the intensity of regional human activities. Generally, higher GDP and greater population density correspond to stronger industrialization, agricultural intensification, and traffic intensity, which in turn increase the exogenous input and accumulation risk of PTEs [65,66].

3.5. Limitations and Future Perspectives

This study first systematically traced the potential sources of PTE pollution in the study area using multivariate statistical analysis and the APCS-MLR receptor model. Further focusing on the spatial differentiation characteristics of Cd, machine learning models were applied to reveal its nonlinear driving mechanisms in depth. Although this study has basically clarified the source composition and dominant driving laws of PTE pollution in the study area, certain limitations still exist due to constraints of data availability and research scale. First, the overall coefficient of determination (R2) of the machine learning models was relatively low, ranging from 0.21 to 0.28. This is attributable to the fact that PTEs are affected by the combined influence of geological background, mining activities, and agricultural activities, accompanied by strong spatial heterogeneity. The spatial enrichment of PTEs is co-regulated by multiple natural and anthropogenic factors. However, restricted by field monitoring conditions and data accessibility, it was difficult for this study to cover all potential influencing indicators comprehensively [20]. In addition, the spatial scale of this study is relatively small, making it difficult to obtain refined driving data related to anthropogenic activities. Some factors associated with agricultural production could not be included in the model analysis due to data gaps, which, to a certain extent, affected the fitting performance of machine learning models and the interpretation accuracy of driving mechanisms [17,32]. It should be noted that the core objective of this study was not to construct a high-precision prediction model, but to identify the relative importance of driving factors. The stable factor ranking obtained under cross-validation indicates that the model is reliable in identifying key driving factors. Second, the spatial resolution matching of multi-source data remains insufficient. The multi-source environmental and socioeconomic data adopted in this study, including atmospheric deposition, population density, and GDP, were extracted by clipping national-scale datasets. Although these data can generally characterize the spatial differentiation of relevant indicators in the study area, their refinement is limited when applied to small-scale regional research. The inadequate spatial resolution matching interferes to some extent with the stability of source apportionment results and the simulation accuracy of machine learning models [49]. Third, soil physicochemical indices were not adopted in the driving factor analysis. These properties are key factors governing Cd migration and adsorption. Restricted by data availability, we did not carry out relevant analysis, leaving room for further exploration of the influencing mechanisms in future research.
In view of the above deficiencies, future research will further improve the data foundation for source apportionment and driving mechanism identification of PTEs. More comprehensive and refined datasets of natural and anthropogenic driving factors will be collected as far as possible, and the spatial resolution of various grid and statistical data will be unified to ensure the consistency of multi-source data matching. In follow-up research, we will include soil physicochemical data to improve analytical depth and further clarify the mechanisms of Cd migration, adsorption and enrichment in soil. An improved data foundation can, on the one hand, provide more solid support for the APCS-MLR receptor model and improve the accuracy of pollution source identification and contribution quantification. On the other hand, it can effectively optimize the fitting performance of machine learning models and the interpretation accuracy of nonlinear driving mechanisms. This will further enhance the scientificity and reliability of the results regarding pollution source tracing and driving mechanism research of PTEs and provide more accurate theoretical basis and data support for the targeted prevention, control, and comprehensive management of regional PTE pollution.

4. Conclusions

This study investigated surface soils in southern Shimen County, coupling multivariate statistics with the APCS-MLR receptor model for quantitative source apportionment of PTEs, and further applying machine learning to systematically reveal the spatial enrichment patterns and dominant driving mechanisms of soil Cd. PTE concentration statistics indicated that Cd contamination was the most prominent, with an average content that was 3.41 times the soil background value of Hunan Province, while other PTEs remained close to regional background levels with no significant anthropogenic enrichment. The APCS-MLR model identified three typical pollution sources. Natural sources, as the primary contributor, exerted a widespread influence on most PTEs, with an average contribution of 43.83%; mining activities dominated the enrichment of As, Cr, and Pb, contributing 30.99% on average; the mixed source of coal mining and agricultural activities was the key anthropogenic trigger for Cd accumulation, contributing 35.65% to Cd specifically and 7.84% overall. Unresolved mixed sources accounted for 17.34%. Driving factor ranking revealed that atmospheric deposition (PM10), topography, strata, and soil type were the core factors governing Cd spatial differentiation, collectively accounting for 70.05% of relative importance. Among the selected drivers, natural factors contributed 44.23%, and anthropogenic factors 55.77%, indicating that Cd enrichment was jointly governed by natural background and exogenous anthropogenic inputs. Overall, by integrating receptor-based source apportionment and machine learning, this study clarified the potential sources of PTEs in soil and the key controls on Cd enrichment in southern Shimen County, elucidating the synergistic effects of geological background, mining exploitation, and agricultural activities. The findings provide scientific support and theoretical reference for source control, zoned management, ecological risk early warning, and territorial ecological restoration in the study area.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/toxics14060525/s1, Table S1: Data description and sources; Figure S1: Potential influence factors of PTEs; Figure S2: Cluster analysis of PTEs; Table S2: Descriptive statistics of Cd concentrations of different strata (mg/kg); Figure S3: Boxplots of Cd concentrations in different strata; Table S3: Descriptive statistics of Cd concentrations of different soil types (mg/kg); Figure S4: Boxplots of Cd concentrations of different soil types; Table S4: Descriptive statistics of Cd concentrations of different land use types (mg/kg); Figure S5: Boxplots of Cd concentrations of different land use types.

Author Contributions

J.W.: Writing—Original Draft, Visualization, Data Curation, Funding Acquisition. X.Z.: Software, Supervision. J.L. (Jiufen Liu): Methodology, Supervision. Y.Y.: Supervision, Methodology. W.Z.: Data Curation, Supervision. C.X.: Methodology. J.Z.: Data Curation. J.L. (Jiwei Liu): Data Curation. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Open Foundation of the Key Laboratory of Coupling Process and Effect of Natural Resources Elements (2024KFKT004) and Shandong Provincial Natural Science Foundation (ZR2024QD217).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Deng, W.B.; Wang, F.X.; Liu, W.J. Identification of factors controlling heavy metals/metalloid distribution in agricultural soils using multi-source data. Ecotoxicol. Environ. Saf. 2023, 253, 114689. [Google Scholar] [PubMed]
  2. Mai, X.R.; Tang, J.; Tang, J.X.; Zhu, X.Y.; Yang, Z.H.; Liu, X.; Zhuang, X.J.; Feng, G.; Tang, L. Research progress on the environmental risk assessment and remediation technologies of heavy metal pollution in agricultural soil. J. Environ. Sci. 2025, 149, 1–20. [Google Scholar]
  3. Rai, P.K.; Lee, S.S.; Zhang, M.; Tsang, Y.F.; Kim, K.H. Heavy metals in food crops: Health risks, fate, mechanisms, and management. Environ. Int. 2019, 125, 365–385. [Google Scholar] [CrossRef] [PubMed]
  4. Goswami, V.; Deepika, S.; Diwakar, S.; Kothamasi, D. Arbuscular mycorrhizas amplify the risk of heavy metal transfer to human food chain from fly ash ameliorated agricultural soils. Environ. Pollut. 2023, 329, 121733. [Google Scholar] [CrossRef] [PubMed]
  5. Zheng, S.N.; Wang, Q.; Yuan, Y.Z.; Sun, W.M. Human health risk assessment of heavy metals in soil and food crops in the Pearl River Delta urban agglomeration of China. Food Chem. 2020, 316, 126213. [Google Scholar] [CrossRef] [PubMed]
  6. Li, X.Z.; Zhao, Z.Q.; Yuan, Y.; Wang, X.; Li, X.Y. Heavy metal accumulation and its spatial distribution in agricultural soils: Evidence from Hunan province, China. Rsc Adv. 2018, 8, 10665–10672. [Google Scholar] [CrossRef] [PubMed]
  7. Ding, Q.; Cheng, G.; Wang, Y.; Zhuang, D.F. Effects of natural factors on the spatial distribution of heavy metals in soils surrounding mining regions. Sci. Total Environ. 2017, 578, 577–585. [Google Scholar] [CrossRef] [PubMed]
  8. Qiao, P.W.; Yang, S.C.; Lei, M.; Chen, T.B.; Dong, N. Quantitative analysis of the factors influencing spatial distribution of soil heavy metals based on geographical detector. Sci. Total Environ. 2019, 664, 392–413. [Google Scholar] [CrossRef] [PubMed]
  9. Isinkaralar, K.; Isinkaralar, O.; Nguyen, T.N.T.; Swislowski, P.; Rajfur, M.; Park, S.J. Ecological-health risks assessment and source apportionment of potentially toxic elements (PTEs) in surface dust near copper mine. Environ. Res. 2026, 298, 124261. [Google Scholar] [CrossRef] [PubMed]
  10. Facchinelli, A.; Sacchi, E.; Mallen, L. Multivariate statistical and GIS-based approach to identify heavy metal sources in soils. Environ. Pollut. 2001, 114, 313–324. [Google Scholar] [CrossRef] [PubMed]
  11. Lv, J.S. Multivariate receptor models and robust geostatistics to estimate source apportionment of heavy metals in soils. Environ. Pollut. 2019, 244, 72–83. [Google Scholar] [CrossRef] [PubMed]
  12. Wang, Z.C.; Hong, N.; Chen, Y.S.; Cheng, G.H.; Liu, A.; Huang, X.W.; Tan, Q. Systematic evaluations of receptor models in source apportionment of particulate solids in road deposited sediments: A practical application for tracking heavy metal sources on urban road surfaces. J. Hazard. Mater. 2025, 485, 136912. [Google Scholar] [PubMed]
  13. Liu, Z.; Zhang, K.; Yang, R.S.; Yang, Z.Z.; Wang, J.X.; Zhang, A.N.; Liu, Y.J. Source apportionment and environmental risk assessment of surface water quality in the Wuding River Basin. J. Environ. Chem. Eng. 2025, 13, 117982. [Google Scholar] [CrossRef]
  14. Bhat, M.A.; Fan, D.D.; Nisa, F.U.; Dar, T.; Kumar, A.; Sun, Q.Q.; Li, S.L.; Mir, R.R. Trace elements in the Upper Indus River Basin (UIRB) of Western Himalayas: Quantification, sources modeling, and impacts. J. Hazard. Mater. 2024, 476, 135073. [Google Scholar] [CrossRef] [PubMed]
  15. Zou, Q.; Han, Z.Y.; He, L.; Cao, W.J.; Yue, X.D. Monitoring heavy metal(loid) concentrations in soils of industrially contaminated sites using machine learning models. J. Hazard. Mater. 2026, 502, 141011. [Google Scholar] [PubMed]
  16. Senila, M.; Levei, E.A.; Senila, L.R.; Oprea, G.M.; Roman, C.M. Mercury in soil and perennial plants in a mining-affected urban area from Northwestern Romania. J. Environ. Sci. Health Part A-Toxic/Hazard. Subst. Environ. Eng. 2012, 47, 614–621. [Google Scholar] [CrossRef] [PubMed]
  17. Bi, Z.H.; Sun, J.; Xie, Y.T.; Gu, Y.L.; Zhang, H.Z.; Zheng, B.W.; Ou, R.T.; Liu, G.Y.; Li, L.; Peng, X.Y.; et al. Machine learning-driven source identification and ecological risk prediction of heavy metal pollution in cultivated soils. J. Hazard. Mater. 2024, 476, 135109. [Google Scholar] [CrossRef] [PubMed]
  18. Wang, Y.K.; Zhang, Z.; Cheng, C.; Liang, C.Y.; Wang, H.J.; He, M.S.; Huang, H.C.; Wang, K. Ensemble learning-assisted quantitative identifying influencing factors of cadmium and arsenic concentration in rice grain based multiplexed data. J. Hazard. Mater. 2025, 485, 136869. [Google Scholar] [PubMed]
  19. SW-846 Method 3050B, Revision 2; Acid digestion of sediments, sludges, and soils. United States Environmental Protection Agency: Washington, DC, USA, 1996.
  20. Yang, J.; Wang, J.Y.; Qiao, P.W.; Zheng, Y.M.; Yang, J.X.; Chen, T.B.; Lei, M.; Wan, X.M.; Zhou, X.Y. Identifying factors that influence soil heavy metals by using categorical regression analysis: A case study in Beijing, China. Front. Environ. Sci. Eng. 2020, 14, 37. [Google Scholar] [CrossRef]
  21. Shi, X.Z.; Yu, D.S.; Xu, S.X.; Warner, E.D.; Wang, H.J.; Sun, W.X.; Zhao, Y.C.; Gong, Z.T. Cross-reference for relating Genetic Soil Classification of China with WRB at different scales. Geoderma 2010, 155, 344–350. [Google Scholar] [CrossRef]
  22. Li, J.L.; He, M.; Han, W.; Gu, Y.F. Analysis and assessment on heavy metal sources in the coastal soils developed from alluvial deposits using multivariate statistical methods. J. Hazard. Mater. 2009, 164, 976–981. [Google Scholar] [CrossRef] [PubMed]
  23. Lu, X.W.; Wang, L.J.; Li, L.Y.; Lei, K.; Huang, L.; Kang, D. Multivariate statistical analysis of heavy metals in street dust of Baoji, NW China. J. Hazard. Mater. 2010, 173, 744–749. [Google Scholar] [CrossRef] [PubMed]
  24. Dong, B.; Zhang, R.Z.; Gan, Y.D.; Cai, L.Q.; Freidenreich, A.; Wang, K.P.; Guo, T.W.; Wang, H.B. Multiple methods for the identification of heavy metal sources in cropland soils from a resource-based region. Sci. Total Environ. 2019, 651, 3127–3138. [Google Scholar] [CrossRef] [PubMed]
  25. Li, R.Y.; Xu, J.; Luo, J.; Yang, P.; Hu, Y.W.; Ning, W.J. Spatial distribution characteristics, influencing factors, and source distribution of soil cadmium in Shantou City, Guangdong Province. Ecotoxicol. Environ. Saf. 2022, 244, 114064. [Google Scholar] [CrossRef] [PubMed]
  26. Wu, J.T.; Margenot, A.J.; Wei, X.; Fan, M.M.; Zhang, H.; Best, J.L.; Wu, P.B.; Chen, F.R.; Gao, C. Source apportionment of soil heavy metals in fluvial islands, Anhui section of the lower Yangtze River: Comparison of APCS-MLR and PMF. J. Soils Sediments 2020, 20, 3380–3393. [Google Scholar]
  27. Zhang, H.; Cheng, S.; Li, H.; Fu, K.; Xu, Y. Groundwater pollution source identification and apportionment using PMF and PCA-APCA-MLR receptor models in a typical mixed land-use area in Southwestern China. Sci. Total Environ. 2020, 741, 140383. [Google Scholar] [PubMed]
  28. Li, Y.; Zhou, S.; Liu, K.; Wang, G.; Wang, J. Application of APCA-MLR receptor model for source apportionment of char and soot in sediments. Sci. Total Environ. 2020, 746, 141165. [Google Scholar] [CrossRef] [PubMed]
  29. Zhou, X.Y.; Wang, X.R. Impact of industrial activities on heavy metal contamination in soils in three major urban agglomerations of China. J. Clean. Prod. 2019, 230, 1–10. [Google Scholar] [CrossRef]
  30. Che, T.H.; Deng, B.L.; Hu, N.W.; Wu, M.X.; Yu, H.W.; Yue, J.; Wang, Q.Y. Spatiotemporal evolution and ecological risk assessment of heavy metals in agricultural black soils of Northeast China. Environ. Res. 2026, 301, 124567. [Google Scholar] [CrossRef] [PubMed]
  31. Yang, G.F.; Ju, Y.; Wu, W.J.; Guo, Z.T.; Ni, W.L. Assessing influential factors of Chinese industrial aqueous cadmium emissions based on machine learning and shapley additive explanations. J. Clean. Prod. 2024, 448, 141431. [Google Scholar] [CrossRef]
  32. Zhang, H.; Yin, S.; Chen, Y.; Shao, S.; Wu, J.; Fan, M.; Chen, F.; Gao, C. Machine learning-based source identification and spatial prediction of heavy metals in soil in a rapid urbanization area, eastern China. J. Clean. Prod. 2020, 273, 122858. [Google Scholar] [CrossRef]
  33. CNEMC. The Background Values of Elements in Chinese Soils; China Environmental Science Press: Beijing, China, 1990. (In Chinese) [Google Scholar]
  34. GB 15618-2018; Soil environmental quality—risk control standard for soil contamination of agricultural land. Ministry of Ecology and Environment of the People’s Republic of China. State Administration for Market Regulation: Beijing, China, 2018.
  35. Wang, J.Y.; Yang, J.; Chen, T.B. Source appointment of potentially toxic elements (PTEs) at an abandoned realgar mine: Combination of multivariate statistical analysis and three common receptor models. Chemosphere 2022, 307, 135923. [Google Scholar] [CrossRef] [PubMed]
  36. Wang, C.B.; Chen, Y.Z.; Xie, Y.X.; Feng, X.Y.; Wu, K.K.; Li, X.X.; Wu, P. Multi-source apportionment of soil heavy metals and spatial heterogeneity of associated risks in overlapping zones with high geological background and mining-smelting activities. Environ. Pollut. 2025, 385, 127079. [Google Scholar] [PubMed]
  37. Wan, Y.P. The Effects of Mineral Exploitation on Soil and Vegetation in Xinguang Town, Shimen County. Master’s Thesis, Hunan Agricultural University, Changsha, China, 2012. (In Chinese) [Google Scholar]
  38. Dai, X.Y.; Liang, J.H.; Shi, H.D.; Yan, T.Z.; He, Z.X.; Li, L.; Hu, H.L. Health risk assessment of heavy metals based on source analysis and Monte Carlo in the downstream basin of the Zishui. Environ. Res. 2024, 245, 117975. [Google Scholar] [CrossRef] [PubMed]
  39. Zeng, Y.F.; Xu, Z.X.; Dong, B. Spatial Distribution, Leaching Characteristics, and Ecological and Health Risk Assessment of Potential Toxic Elements in a Typical Open-Pit Iron Mine Along the Yangzi River. Water 2024, 16, 3017. [Google Scholar] [CrossRef]
  40. Zhang, X.Y.; Lin, F.F.; Wong, M.T.F.; Feng, X.L.; Wang, K. Identification of soil heavy metal sources from anthropogenic activities and pollution assessment of Fuyang County, China. Environ. Monit. Assess. 2009, 154, 439–449. [Google Scholar] [PubMed]
  41. Kamaraj, J.; Sekar, S.; Roy, P.D.; Arumugam, B.; Kumar, P.; Badimela, U.; Perumal, M. Machine learning approach for heavy metal source identification and spatial distribution in coastal sediments of Tiruchendur, Southern India. Mar. Pollut. Bull. 2026, 228, 119616. [Google Scholar] [CrossRef] [PubMed]
  42. Liang, J.; Feng, C.T.; Zeng, G.M.; Gao, X.; Zhong, M.Z.; Li, X.D.; Li, X.; He, X.Y.; Fang, Y.L. Spatial distribution and source identification of heavy metals in surface soils in a typical coal mine city, Lianyuan, China. Environ. Pollut. 2017, 225, 681–690. [Google Scholar] [CrossRef] [PubMed]
  43. Atafar, Z.; Mesdaghinia, A.; Nouri, J.; Homaee, M.; Yunesian, M.; Ahmadimoghaddam, M.; Mahvi, A.H. Effect of fertilizer application on soil heavy metal concentration. Environ. Monit. Assess. 2010, 160, 83–89. [Google Scholar] [PubMed]
  44. GimenoGarcia, E.; Andreu, V.; Boluda, R. Heavy metals incidence in the application of inorganic fertilizers and pesticides to rice farming soils. Environ. Pollut. 1996, 92, 19–25. [Google Scholar] [CrossRef] [PubMed]
  45. Lugon-Moulin, N.; Ryan, L.; Donini, P.; Rossi, L. Cadmium content of phosphate fertilizers used for tobacco production. Agron. Sustain. Dev. 2006, 26, 151–155. [Google Scholar] [CrossRef]
  46. Tang, Q.; Chang, L.R.; Wang, Q.Y.; Miao, C.H.; Zhang, Q.; Zheng, L.G.; Zhou, Z.K.; Ji, Q.Z.; Chen, L.; Zhang, H.M. Distribution and accumulation of cadmium in soil under wheat-cultivation system and human health risk assessment in coal mining area of China. Ecotoxicol. Environ. Saf. 2023, 253, 114688. [Google Scholar] [CrossRef] [PubMed]
  47. Shi, W.C.; Li, T.; Feng, Y.; Su, H.; Yang, Q.L. Source apportionment and risk assessment for available occurrence forms of heavy metals in Dongdahe Wetland sediments, southwest of China. Sci. Total Environ. 2022, 815, 152837. [Google Scholar] [CrossRef] [PubMed]
  48. Tan, K.; Wang, H.M.; Chen, L.H.; Du, Q.; Du, P.J.; Pan, C.C. Estimation of the spatial distribution of heavy metal in agricultural soils using airborne hyperspectral imaging and random forest. J. Hazard. Mater. 2020, 382, 120987. [Google Scholar] [CrossRef] [PubMed]
  49. Wang, J.Y.; Yang, J.; Zhao, C.; Tian, X.L.; Zhao, X.F.; Zhao, W.; Xin, H.; Li, X.J. Revealing Influencing Mechanisms and Spatial Pattern of Soil Cadmium Through Geodetector and Spatial Analysis. Land 2025, 14, 1975. [Google Scholar] [CrossRef]
  50. Huang, M.L.; Rong, X.T.; Ding, Y.S.; Gao, X.L.; Li, M.; Wang, X.Z.; Liu, H.L. Cadmium accumulation and toxic effects on wheat under foliar and soil exposure to the simulated atmospheric deposition of cadmium. Environ. Geochem. Health 2026, 48, 116. [Google Scholar] [CrossRef] [PubMed]
  51. Qin, M.H.; Jin, Y.L.; Peng, T.Y.; Zhao, B.; Hou, D.Y. Heavy metal pollution in Mongolian-Manchurian grassland soil and effect of long-range dust transport by wind. Environ. Int. 2023, 177, 108019. [Google Scholar] [PubMed]
  52. Melaku, S.; Morris, V.; Raghavan, D.; Hosten, C. Seasonal variation of heavy metals in ambient air and precipitation at a single site in Washington, DC. Environ. Pollut. 2008, 155, 88–98. [Google Scholar] [CrossRef] [PubMed]
  53. Xian, L.H.; Lu, D.H.; Yang, Y.T.; Feng, J.Y.; Fang, J.B.; Jacobs, D.F.; Wu, D.M.; Zeng, S.C. Effects of woodland slope on heavy metal migration via surface runoff, interflow, and sediments in sewage sludge application. Sci. Rep. 2024, 14, 13468. [Google Scholar] [CrossRef] [PubMed]
  54. Rezapour, S.; Golmohammad, H.; Ramezanpour, H. Impact of parent rock and topography aspect on the distribution of soil trace metals in natural ecosystems. Int. J. Environ. Sci. Technol. 2014, 11, 2075–2086. [Google Scholar] [CrossRef]
  55. Young, G.; Chen, Y.Q.; Yang, M. Concentrations, distribution, and risk assessment of heavy metals in the iron tailings of Yeshan National Mine Park in Nanjing, China. Chemosphere 2021, 271, 129546. [Google Scholar] [CrossRef] [PubMed]
  56. Aizawa, S.; Akaiwa, H. Cadmium contents of Triassic and Permian limestones in central Japan. Chem. Geol. 1992, 98, 103–110. [Google Scholar] [CrossRef]
  57. Jin, G.; Shi, Z.M.; Deng, H.; Zheng, T.L.; Zhang, Y.F.; Xie, J.X.; Shi, Z.L.; Zhu, Y.H.; Zhang, N.; Zou, C.J. Influence of DOM on Cd speciation during soil weathering in Permian strata: A case study in Xingwen County, Southern Sichuan Province, China. J. Hazard. Mater. 2026, 506, 141585. [Google Scholar] [CrossRef] [PubMed]
  58. Hu, B.; Guo, P.Y.; Wu, Y.Q.; Deng, J.; Su, H.T.; Li, Y.Q.; Nan, Y.T. Study of soil physicochemical properties and heavy metals of a mangrove restoration wetland. J. Clean. Prod. 2021, 291, 125965. [Google Scholar] [CrossRef]
  59. Wang, H.B.; Zhang, Q.; Gomez, M.A.; Jia, Y.F.; Yao, S.H.; Li, S.F. Cadmium chemical fractions in sediments: Effect of grain size, pH, organic acids, and inorganic ions. Environ. Earth Sci. 2022, 81, 478. [Google Scholar] [CrossRef]
  60. Yuan, C.L.; Li, Q.; Sun, Z.Y.; Sun, H.W. Effects of natural organic matter on cadmium mobility in paddy soil: A review. J. Environ. Sci. 2021, 104, 204–215. [Google Scholar] [CrossRef] [PubMed]
  61. Wu, Y.D.; Li, J.Y.; Teng, R.; Zeng, Z.X.; Yu, J.; Zhao, X.T.; Li, Y.W.; Huang, P.X.Y.; Deng, S.W. Effects of iron, manganese, and aluminum oxides on soil cadmium distribution coefficient: A multi-scale analysis based on explainable machine learning. J. Hazard. Mater. 2026, 507, 141702. [Google Scholar] [CrossRef] [PubMed]
  62. Wen, Y.B.; Li, W.; Yang, Z.F.; Zhuo, X.X.; Guan, D.X.; Song, Y.X.; Guo, C.; Ji, J.F. Evaluation of various approaches to predict cadmium bioavailability to rice grown in soils with high geochemical background in the karst region, Southwestern China. Environ. Pollut. 2020, 258, 113645. [Google Scholar] [CrossRef] [PubMed]
  63. Zhao, F.J.; Ma, Y.B.; Zhu, Y.G.; Tang, Z.; McGrath, S.P. Soil Contamination in China: Current Status and Mitigation Strategies. Environ. Sci. Technol. 2015, 49, 750–759. [Google Scholar] [PubMed]
  64. Li, P.; Li, X.J.; Bai, J.K.; Meng, Y.C.; Diao, X.P.; Pan, K.; Zhu, X.S.; Lin, G.H. Effects of land use on the heavy metal pollution in mangrove sediments: Study on a whole island scale in Hainan, China. Sci. Total Environ. 2022, 824, 153856. [Google Scholar] [CrossRef] [PubMed]
  65. Wu, Y.; Zhou, C.Y.; Hu, W.Y.; Li, M.Y.; Ding, L. Spatial distribution and driving force analysis of soil heavy metals in the water source area of the middle route of the South-to-North Water Diversion Project. Ecol. Indic. 2024, 163, 112126. [Google Scholar] [CrossRef]
  66. Zhang, L.R.; Chen, T.; Wang, G.Y.; Jing, R.Y.; Zhu, X.L.; Kou, B.; Zhou, S.; Zhang, S.L. Identifying the sources and accumulation trends of heavy metals in representative polluted farmland on the Loess Plateau. Catena 2025, 261, 109579. [Google Scholar] [CrossRef]
Figure 1. Spatial distribution of sampling points.
Figure 1. Spatial distribution of sampling points.
Toxics 14 00525 g001
Figure 2. Spatial distribution pattern of PTEs in soil.
Figure 2. Spatial distribution pattern of PTEs in soil.
Toxics 14 00525 g002
Figure 3. The correlation matrix of PTEs.
Figure 3. The correlation matrix of PTEs.
Toxics 14 00525 g003
Figure 4. Pollution source contribution obtained by APCS-MLR. (a): Sankey diagram of pollution source contributions; (b) pie chart of average contribution rates of pollution sources.
Figure 4. Pollution source contribution obtained by APCS-MLR. (a): Sankey diagram of pollution source contributions; (b) pie chart of average contribution rates of pollution sources.
Toxics 14 00525 g004
Figure 5. Spatial distribution of mineral resources in the study area.
Figure 5. Spatial distribution of mineral resources in the study area.
Toxics 14 00525 g005
Figure 6. Driving factor importance ranking and relative contribution.
Figure 6. Driving factor importance ranking and relative contribution.
Toxics 14 00525 g006
Table 1. Descriptive statistics of PTE concentrations (mg/kg).
Table 1. Descriptive statistics of PTE concentrations (mg/kg).
PTEsMinMaxMeanMedianSDCV (%)ABVsSPRSVs
As3.4649.4411.9711.045.1342.8315.730
Cd0.0413.340.430.280.80187.200.1260.4
Cr18.85306.2581.0181.4522.9328.3071.4250
Cu5.3576.3224.1524.007.3230.2927.3150
Ni19.40167.8449.2549.0512.2224.8031.970
Pb8.94116.0929.5629.777.6625.9329.7100
Zn16.34930.6076.7773.8839.6151.6094.4200
Abbreviations: SD, standard deviations; CV, coefficient of variation; ABVs: Average background values in Hunan province [33]; SPRSVs: Soil pollution risk screening values of China (GB15618-2018) [34].
Table 2. Rotated component matrix from PCA.
Table 2. Rotated component matrix from PCA.
PTEsPC1PC2PC3
As0.1160.892−0.015
Cd0.210−0.0530.913
Cr0.2810.5960.528
Cu0.8060.3670.200
Ni0.7580.2710.426
Pb0.5790.651−0.044
Zn0.8900.0620.176
Explained variance (%)35.55425.57419.546
Cumulative variance (%)35.55461.12880.673
Bold text represents the highest loading value of each heavy metal for each principal component.
Table 3. Prediction accuracy among different machine learning models.
Table 3. Prediction accuracy among different machine learning models.
ModelsMAERMSER2
RF0.170.250.28
XGBoost0.180.270.21
LightGBM0.180.270.22
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, J.; Zhao, X.; Liu, J.; Yan, Y.; Zhao, W.; Xia, C.; Zheng, J.; Liu, J. Unraveling the Origins and Drivers of Potentially Toxic Elements (PTEs): A Sequential Framework Integrating Receptor Model and Machine Learning. Toxics 2026, 14, 525. https://doi.org/10.3390/toxics14060525

AMA Style

Wang J, Zhao X, Liu J, Yan Y, Zhao W, Xia C, Zheng J, Liu J. Unraveling the Origins and Drivers of Potentially Toxic Elements (PTEs): A Sequential Framework Integrating Receptor Model and Machine Learning. Toxics. 2026; 14(6):525. https://doi.org/10.3390/toxics14060525

Chicago/Turabian Style

Wang, Jingyun, Xiaofeng Zhao, Jiufen Liu, Yunxian Yan, Wei Zhao, Chuanbo Xia, Jianye Zheng, and Jiwei Liu. 2026. "Unraveling the Origins and Drivers of Potentially Toxic Elements (PTEs): A Sequential Framework Integrating Receptor Model and Machine Learning" Toxics 14, no. 6: 525. https://doi.org/10.3390/toxics14060525

APA Style

Wang, J., Zhao, X., Liu, J., Yan, Y., Zhao, W., Xia, C., Zheng, J., & Liu, J. (2026). Unraveling the Origins and Drivers of Potentially Toxic Elements (PTEs): A Sequential Framework Integrating Receptor Model and Machine Learning. Toxics, 14(6), 525. https://doi.org/10.3390/toxics14060525

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop