Next Article in Journal
Analysis of Legislative, Infrastructural, and Economic Determinants of Electromobility Development in the Slovak Republic with a Comparison of Visegrad Group (V4) Countries in the Context of EU Regulations
Previous Article in Journal
Critical Station Identification and Vulnerability Assessment of Metro Networks Based on Dynamic DomiRank and Flow DomiGCN
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning-Based Analysis of Arsenic Migration from Soil to Highland Barley in High Geological Background Areas

1
Key Laboratory for Environmental Factors Control of Agro-Product Quality Safety, Agro-Environmental Protection Institute, Ministry of Agriculture and Rural Affairs, Tianjin 300191, China
2
Research Institute for Environmental Innovation (Binhai, Tianjin), Tianjin 300450, China
3
State Environmental Protection Key Laboratory of All Material Fluxes in River Ecosystems, College of Environmental Sciences and Engineering, Peking University, Beijing 100871, China
*
Author to whom correspondence should be addressed.
Sustainability 2026, 18(4), 1782; https://doi.org/10.3390/su18041782
Submission received: 16 December 2025 / Revised: 4 February 2026 / Accepted: 6 February 2026 / Published: 10 February 2026
(This article belongs to the Section Soil Conservation and Sustainability)

Abstract

To investigate the effect of high-arsenic (As) soil on the absorption of As by highland barley, 135 pairs of soil–crop samples were collected in the main producing areas of highland barley in the middle reaches of the Yarlung Zangbo River. Eight soil variables, including pH, redox potential (Eh), soil organic matter (SOM), total arsenic (T-As), total iron (T-Fe), total manganese (T-Mn), chemically extractable As (KH2PO4-As), and bioavailable As determined by diffusive gradients in thin films (DGT-As), were measured, along with As concentrations in barley grains (HB-As). Machine learning approaches were employed to construct predictive models for HB-As accumulation, and feature influence mechanisms were interpreted using SHapley Additive exPlanations (SHAP) and Partial Dependence Plot (PDP) analyses. The results showed that: (1) among models constructed using the full feature set, the random forest (RF) model exhibited the best predictive performance for HB-As, with R2 values of 0.756 and 0.651 for the training and testing datasets, respectively; (2) SHAP analysis indicated that DGT-As had the greatest contribution to the model (30.5%), followed by T-As and T-Fe/Mn; and (3) significant interaction effects among soil variables jointly influenced HB-As accumulation. This study provides scientific support for agricultural product safety, soil security, and sustainable land use in plateau agroecosystems.

1. Introduction

Arsenic (As), a metalloid element widely distributed in nature, has been proven to exhibit significant carcinogenic, teratogenic, and mutagenic effects. It is classified as a class I carcinogen by the International Agency for Research on Cancer (IARC) and listed as one of the five toxic elements by the United States Environmental Protection Agency (USEPA) [1]. Arsenic and its compounds accumulate in animals and plants and spread through the food chain, endangering the ecological environment and human health. Therefore, inhibiting the migration of As from soil to crops is a key measure to ensure the quality and safety of agricultural products and human health.
The Qinghai–Tibet Plateau is one of the regions with the highest altitude, the largest area, and the most complex terrain in the Earth’s geographical environment [2]. The dual impacts of global climate change and human activities have led to a continuous increase in ecological risks and resource–environmental pressures faced by the Qinghai–Tibet Plateau in recent years [3]. Yu et al. proposed that climate warming will promote the thawing of permafrost and may accelerate the release of soil As [4]. Human activities such as mining exploration and railway construction will cause soil As pollution [5]. Relevant studies showed that the background value of As in this area was 24.50 mg·kg−1, which was significantly higher than the background value of soil in China (11.20 mg·kg−1) [6]. As a major crop in Tibet, highland barley is mainly cultivated in Tibet and its adjacent alpine regions, with its planting area accounting for 70% of Tibet’s cultivated land and 43% of the crop planting area on the Qinghai–Tibet Plateau, respectively [7,8]. Once highland barley is contaminated with As, local residents who rely on it for self-sufficiency will be chronically exposed to a high-As environment, which will undoubtedly increase health risks. Therefore, there is an urgent need to systematically clarify the characteristics of As pollution in soil and highland barley on the Qinghai–Tibet Plateau. The absorption and accumulation of As in highland barley are mainly regulated by the bioavailable As in soil. In addition, bioavailable As is also affected by various soil characteristic factors, such as pH, soil organic matter (SOM) and soil mineral content [9,10,11,12]. Therefore, it is necessary to evaluate the effect of soil properties on As in order to reduce the toxicity of As and its migration to crops.
In recent years, machine learning (ML) approaches have been widely applied to the simulation and prediction of heavy metal migration and accumulation in soil–crop systems [13]. Previous studies have demonstrated that methods such as random forest (RF), gradient boosting models (GBMs), and generalized linear models (GLMs) can effectively characterize the effects of soil factors on heavy metal uptake by crops [14,15,16,17]. However, existing studies have mainly focused on lowland agricultural regions and typical heavy metals such as Cd, while machine learning-based investigations into As accumulation in plateau soil–highland barley systems remain very limited. In terms of indicator selection, most studies rely primarily on total metal concentrations or simple chemical extraction indices, with insufficient consideration of indicators that can reflect dynamic bioavailability. Moreover, many machine learning-based studies emphasize predictive performance but lack in-depth analysis of the relative importance and interaction effects of key controlling factors. Therefore, it is necessary to develop As accumulation prediction models specifically applicable to plateau soil–highland barley systems and to further elucidate the regulatory roles of key soil factors in controlling As uptake by highland barley.
The diffusive gradients in thin-films (DGT) technique, based on a diffusion–adsorption coupling mechanism, characterizes the dynamic resupply capacity of (quasi-)metals in soil solution, which is closer to their actual bioavailability at the plant rhizosphere [18]. Therefore, based on the analytical results of soil and highland barley samples, this study integrated multiple soil characteristic factors, including the DGT indicator, to construct machine learning models for predicting As accumulation in highland barley grains. This study combined Shapley Additive exPlanations (SHAP) and Partial Dependence Plot (PDP) to interpret the machine learning model and analyzed the influence direction of each soil factor on the accumulation of As in highland barley grains. The purposes were: (1) to clarify the basic physical and chemical properties of soil in the study area and the status and accumulation characteristics of As pollution in the soil–barley system; (2) to construct a machine learning prediction model for As accumulation in highland barley based on multiple soil characteristic factors; (3) to reveal the contribution rate and importance of various environmental factors to the accumulation of As in highland barley and explain the global and local nonlinear relationships based on SHAP and PDP. This study provides a scientific basis and practical support for As pollution risk prevention and control of highland barley and sustainable development of soil in the Qinghai–Tibet Plateau.

2. Materials and Methods

2.1. Overview of the Study Area and Sample Collection

Figure 1 illustrates the overview of the study area and the distribution of sampling sites. The study area is located at the northern foot of the Himalayas and the southern bank of the middle reaches of the Yarlung Zangbo River, serving as one of the major highland barley planting areas in the middle reaches of this basin. Geomorphologically, the region is dominated by the valley terraces of the Yarlung Zangbo River. Its soil-forming conditions are jointly influenced by topography and climate, with the main soil types including subalpine steppe soil, subalpine shrub-steppe soil, subalpine meadow soil, alpine meadow soil, and alpine desert soil. Climatically, the region belongs to a plateau temperate semi-arid monsoon climate, characterized by high cold and low oxygen as core features. The annual average temperature ranges from 0 to 3 °C, the annual precipitation is 300 to 600 mm, and the freeze–thaw cycle lasts up to 166 days. The combined effects of climatic conditions and soil types have shaped the unique agricultural ecological environment of the region.
Samples were collected in August 2024 during the maturity stage of highland barley. The sampling sites were selected to cover the main highland barley growing areas in the middle reaches of the Yarlung Zangbo River. A field-based paired sampling design was adopted to ensure a direct linkage between soil properties and arsenic (As) accumulation in highland barley. At each sampling site, surface soil (0–20 cm) and the corresponding highland barley plants were collected simultaneously, with each soil–barley pair originating from the same small plot within a single field to minimize the effects of spatial heterogeneity. All highland barley plants collected were of the same locally cultivated variety, ensuring consistency of crop genotype across sampling sites and facilitating analysis of spatial variability in As accumulation. The spacing between sampling points was determined according to field size and soil homogeneity, generally ranging from 20 to 50 m, to balance representativeness and sampling efficiency.

2.2. Sample Pretreatment and Chemical Analysis

2.2.1. Sample Pretreatment

After being transported back to the laboratory, 200 g of fresh soil samples was retained for the determination of soil pH and Eh. The remaining soil samples were air-dried naturally and cleared of impurities such as gravel and plant residues. After the quartering method, 10 g was taken and passed through a 10-mesh sieve for the determination of KH2PO4-extractable As (KH2PO4-As), representing the chemically extractable As fraction. A total of 220 g was sieved through a 20-mesh sieve, of which 120 g was used to determine DGT-measurable As (DGT-As) using the diffusive gradients in thin films (DGT) technique, representing the bioavailable arsenic in soil; 50 g was used for the determination of soil pH and soil organic matter (SOM) content; and the remaining 50 g was further sieved through a 100-mesh sieve for the determination of total soil iron, manganese and arsenic (T-Fe, T-Mn, and T-As). After the threshing treatment of highland barley samples, the grains were washed with tap water to remove the surface sediment, cleaned with distilled water, dried in a constant temperature oven at 60 °C to constant weight, and passed through a 40-mesh sieve after crushing treatment to prepare a grain analysis sample for the determination of As content (HB-As) in highland barley grains.

2.2.2. Determination of pH, Eh, SOM and Total Metal Content

Soil pH and Eh were determined using the potentiometric method. Briefly, 10.0 g of fresh soil was weighed into a 50 mL centrifuge tube, and 25 mL of CO2-free water was added. The suspension was shaken for 2 min and allowed to stand for 30 min, after which pH and Eh were measured simultaneously using a pH sensor and Eh electrodes. SOM was determined by the potassium dichromate oxidation spectrophotometric method. Specifically, 0.5 g of soil and 0.100 g of HgSO4 were placed in a 100 mL stoppered digestion tube. Then, 5 mL of 0.27 mol·L−1 K2Cr2O7 solution and 7.5 mL of concentrated H2SO4 were added sequentially, and the mixture was gently shaken to homogenize. The tube was heated at 135 °C for 30 min. After digestion, the tube was cooled in a water bath to room temperature. The digest was finally diluted to 100 mL with water, stoppered, and mixed thoroughly. The solution was left to clarify, and the supernatant was transferred to a 10 mm cuvette for absorbance measurement at 585 nm.
The concentrations of T-As, Fe and Mn in soil, as well as As in grain, were determined by the acid digestion method. A 0.1000 g portion of air-dried, sieved soil was placed in a closed PTFE tank, followed by the addition of 8 mL HNO3, 4 mL HF and 1 mL HClO4, and it was left standing for 12 h. The samples were then placed in a graphite digestion furnace, where the initial temperature was controlled at 120 °C. After 30 min, the temperature was raised to 150 °C and then digested continuously for 180 min. Finally, the acid was expelled at 180 °C until the solution in the tank became colorless, transparent, or yellowish and viscous. The solution was redissolved with 2% HNO3 and diluted to 50 mL. After filtration through a 0.22 μm filter membrane, T-As content in the digested solution was quantified by ICP-MS, while Fe and Mn contents were measured by ICP-OES. A total of 10% of the samples were randomly selected for 3 replicates.

2.2.3. Measurement of Chemically Extractable and Bioavailable Arsenic in Soil

The chemically extractable As was determined using the KH2PO4 extraction method. A 4.00 g soil sample was weighed into a 150 mL Erlenmeyer flask, and 100 mL of 0.05 mol L−1 KH2PO4 solution was added. The mixture was shaken at 25 °C and 180 rpm for 4 h, then centrifuged at 4500 rpm for 5 min. The supernatant was collected and filtered through a 0.45 μm membrane filter. The As concentration in the filtrate was determined using an atomic fluorescence spectrometer [19].
The DGT device consisted of a protective shell, a filter membrane, a diffusion layer, and a binding layer. The diffusion layer was prepared using 15% acrylamide with 0.3% agarose-derived crosslinking agent. DTPA-modified layered double hydroxide (DTPA-LDH) nanomaterials were used as the adsorbent for the binding layer, following the method described in Refs. [18,20]. A nitrocellulose membrane with a pore size of 0.45 μm and a thickness of 0.15 mm was used as the filter membrane. The filter membrane, diffusion layer, and binding layer were cut into circular disks with a diameter of 2.75 cm, assembled into the DGT casing, and stored at 4 °C prior to use.
Briefly, 120 g of soil was weighed into a polypropylene container, and the soil moisture content was adjusted to 100% of the water-holding capacity (WHC) using ultrapure water. The soil was gently shaken to remove entrapped air bubbles, carefully leveled, and incubated at 25 °C for 24 h to allow moisture equilibration. After incubation, the DGT device was deployed by placing it horizontally on the soil surface with the diffusion window facing downward and in direct contact with the soil. The device was lightly pressed to ensure uniform contact without disturbing the soil structure, and no visible air gaps were present at the soil–device interface. Each treatment was conducted in triplicate. Following deployment, the setup was sealed and returned to the incubator at 25 °C for 24 h. After retrieval, the binding layer was removed and eluted with 9 mL of 0.5 mol·L−1 HNO3 for 30 min. The eluate was then shaken, and the As concentration was determined by ICP-MS. Blank DGT devices, which were not in contact with soil, were processed in parallel using the same deployment, elution, and analytical procedures to correct for potential background contamination. The DGT-extractable As concentration was calculated according to Equation (1).
D G T A s = M g D A t
In Equation (1), M refers to the heavy metal concentration in the DGT extraction solution determined by ICP-MS (μg·L−1); g represents the total thickness of the diffusion membrane and the nitrocellulose membrane, with a value of 0.089 cm; D stands for the diffusion coefficient of As in the diffusion membrane, with a value of 4.90 × 10−6 cm2·s−1; A denotes the area of each DGT window, which is 3.14 cm2; and t indicates the deployment time of DGT, corresponding to 86,400 s.

2.3. Construction of HB-As Prediction Model

2.3.1. Model Selection

In order to establish the As accumulation prediction model of the soil–barley system, three ML algorithms were selected for training, including the traditional linear model: Multiple Linear Regression (MLR). It is a parametric prediction model based on statistical learning, which can establish a linear correlation between a single dependent variable and multiple independent variables and obtain their linear combination regression coefficients [21]. Kernel method model: Support Vector Regression (SVR): it maximizes the classification margin by finding the optimal decision boundary in a high-dimensional space. When the training samples are linearly separable, the optimal classification hyperplane for the two sample classes can be directly identified in the original feature space [22,23]. When the training samples are linearly inseparable, a nonlinear kernel function is adopted to map the original feature space to a higher-dimensional feature space, thereby achieving linear separability [24]. Ensemble Learning Model: random forest (RF): this is a decision tree-based ensemble model that is capable of establishing nonlinear relationships among multiple variables and can improve prediction accuracy by mitigating overfitting to the dataset [25,26].

2.3.2. Model Development

The model construction process and specific workflow are illustrated in Figure 2. The input features included 8 categories of measured data (pH, Eh, SOM, T-As, T-Fe, T-Mn, KH2PO4-As, and DGT-As), with HB-As content as the output feature. All datasets were randomly split into training and test sets at a ratio of 8:2. To mitigate the problems of insufficient model training and poor generalization ability caused by limited valid samples, the Long Short-Term Memory (LSTM) network was adopted for sample augmentation, with the number of augmented samples set to 5. In addition, 5-fold cross-validation was employed to avoid the randomness caused by a single data split.

2.3.3. Model Evaluation

The performance of the model was comprehensively evaluated using the mean absolute error (MAE), root mean square error (RMSE) and coefficient of determination (R2) [27,28]. MAE and RMSE are used to describe the error between the predicted value of the model and the measured value. Among them, RMSE measures the dispersion of the error value. The smaller the MAE and RMSE values, the more reliable the model and the better the prediction accuracy. R2 reflects the proportion of variance in the dependent variable that is explained by the model. The value of R2 ranges from 0~1, with a value closer to 1 indicating a better fit. The combination of several measurement indicators can fully reflect the accuracy, robustness and explanatory power of the model. The specific calculations are shown in Equations (2)–(4).
M A E = 1 m i = 1 m y i y ^ i
R M S E = M S E
R 2 = 1 i = 1 m y i y ¯ i 2 i = 1 m y i y ^ i 2
where m denotes the number of samples; y i denotes the actual value; y ¯ i denotes the mean of the actual values; y ^ i denotes the predicted value and MSE denotes the mean squared error, whose calculation method is shown in Equation (5).
M S E = 1 m i = 1 m y i y ^ i 2
The variables herein have the same meanings as those in Equation (2).

2.3.4. Model Explanation

In order to study the contribution of different input features to the accumulation of As in highland barley, SHAP were used to analyze the feature ranking and weight in the best machine learning model. This approach clarified the specific contribution direction of input variables to the results. Based on the Shapley value principle in game theory, the prediction results of the model for each sample are decomposed into the sum of the contribution values of each input feature [29]. By calculating the SHAP values of all samples, on the one hand, the global importance of the features is quantified, and the key soil factors that have the greatest impact on arsenic accumulation in highland barley are identified; on the other hand, the specific contribution direction and intensity of individual factors at different sampling points are analyzed by the positivity, negativity and size of SHAP values of a single sample. At the same time, the nonlinear relationship between features and contribution values is revealed by a SHAP dependence graph [30].
Furthermore, the PDP analysis was used to further explain the model and fix the value distribution of other features and simulate the average trend of the model prediction when the target feature changed within its value range [31]. The marginal effect curve of the feature on arsenic accumulation in highland barley was plotted to visually present the toxicological effect of a single feature on the prediction results [32].

3. Results

3.1. Soil Properties and as Pollution Status

Figure 3 show the kernel density–box plot combination of soil properties, which intuitively displays the distribution characteristics and concentration trend of each index. Soil pH in the study area exhibited a unimodal and symmetric distribution, with a median value of 8.00, an interquartile range (IQR) of 0.33, and a standard deviation (SD) of 0.29. Most values were highly concentrated within the range of 7.75–8.25, indicating that the soils are predominantly weakly alkaline with very low spatial heterogeneity. This feature is consistent with the results of Li et al. on the soil pH survey of the Qinghai–Tibet Plateau, further confirming the common trend of soil alkalinity in the region [33]. Soil Eh ranged from −30 to 20 mV, with a median of −12.45 mV, an IQR of 17.08, and an SD of 12.45. Negative values were dominant, indicating a weakly reducing distribution pattern. Soil SOM exhibited a wide distribution within the range of 20–80 g·kg−1, and the kernel density curve showed no clear unimodal concentration, suggesting strong spatial heterogeneity in soil fertility across the study area.
The T-As content displayed a bimodal distribution, with a median of 45.00 mg·kg−1, an IQR of 16.96, and an SD of 28.56, which is 2.18 times the regional background value for agricultural soils (24.50 mg·kg−1). In addition, several extremely high values (>100 mg·kg−1) were observed, indicating the presence of localized areas with severe As contamination. T-Fe and T-Mn both showed unimodal and symmetric distributions, with median values of 47.02 g·kg−1 and 0.74 g·kg−1, IQRs of 5.64 and 0.14, and SDs of 3.97 and 0.12, respectively. The data were highly concentrated with no significant outliers, indicating a relatively uniform distribution of iron and manganese oxides in the study area.
KH2PO4-As exhibited a unimodal and skewed distribution, with a median of 9.46 mg·kg−1, an IQR of 10.06, and an SD of 7.18. Peak values were mainly concentrated in the range of 5–15 mg·kg−1, while a few samples approached 30 mg·kg−1, suggesting relatively high potential bioavailability of As in localized areas. DGT-As had a median of 5.15 μg·L−1, an IQR of 4.09, and an SD of 6.05, with most values concentrated in the range of 0–10 μg·L−1 and a few high-value outliers (>20 μg·L−1), indicating markedly elevated direct bioavailability of As in certain soils.
HB-As showed a median of 0.04 mg·kg−1, an IQR of 0.08, and an SD of 0.21, which is far below the Chinese maximum limit for food contaminants (0.5 mg·kg−1). However, a few high-value outliers (>0.5 mg·kg−1) were observed, indicating that although the risk of As contamination in highland barley is generally low for most sampling sites, there is a potential exceedance risk in localized areas.

3.2. Correlation Analysis Between HB-As and Related Influencing Factors

The key factors affecting HB-As were analyzed by Pearson correlation, and the results are shown in Figure 4. A significant positive correlation was found between T-As and HB-As (r = 0.41, p < 0.01), which directly confirms the basic regulation of soil As reserves on crop absorption. That is, the higher the total reserves of As in the soil, the greater the potential probability for highland barley roots to contact and absorb As, thereby increasing the cumulative As content in highland barley [34]. It is worth noting that the speciation of T-As is significantly regulated by pH, Eh and SOM. T-As was negatively correlated with pH (r = −0.35, p < 0.01), positively correlated with Eh (r = 0.29, p < 0.01), and negatively correlated with SOM (r = −0.25, p < 0.01). Among these factors, pH regulates the adsorption–desorption equilibrium of As by changing the surface charge of soil colloids and the ionic form of As (such as H2AsO4/HAsO42− conversion of As(V)). Eh affects the migration activity of As by driving the valence transformation of As(III)/As(V), and SOM indirectly regulates the release potential of As by adsorption, complexation or a change to the soil reduction environment [35]. Therefore, pH, Eh and SOM can indirectly affect HB-As by regulating T-As. DGT-As, which can reflect the bioavailable fraction of As in soil associated with uptake by highland barley, showed a positive correlation with HB-As (r = 0.11, p > 0.05) [36]. KH2PO4-As mainly represents the exchangeable As weakly bound to soil colloids; because highland barley has a relatively low capacity to absorb and utilize this As fraction, KH2PO4-As exhibited a negative correlation with HB-As (r = −0.14, p > 0.05). However, the results indicate that neither relationship was statistically significant (p > 0.05), suggesting that a single measure of bioavailable As is insufficient to comprehensively characterize As bioavailability. The correlation between T-Fe, T-Mn, and HB-As is weak, which differs from the existing research suggesting that iron and manganese oxides can reduce the bioavailability of As through oxidation and adsorption [37]. This suggests that the complex and diverse characteristics of soil make it difficult to explain the accumulation of As in highland barley through simple correlation analysis. It is therefore necessary to introduce machine learning approaches to characterize the nonlinear relationships and interactions among multiple factors, thereby providing a more comprehensive understanding of the effects of soil environmental factors on HB-As.

3.3. Performance of Machine Learning Prediction Models

Based on eight input variables, MLR, SVR, and RF were used to construct HB-As accumulation prediction models. The model performance is shown in Table 1, and the fitting results under the input condition of S4 are shown in Figure 5.
A comparison of model performance indicates that the predictive ability of MLR is significantly weaker than that of SVR and RF. Across all datasets, the R2 values of MLR in the training set ranged from 0.254 to 0.385 and further decreased to 0.168 to 0.309 in the test set. The MAE (0.0556~0.0621) and RMSE (0.1169~0.1242) in the training set were similar to those in the test set, showing no significant performance difference. In contrast, SVR and RF exhibited significant advantages in predictive performance, with both showing good fitting stability and small performance differences between the training and test sets. Specifically, the R2 values of SVR in the training set ranged from 0.587~0.737 and from 0.541~0.684 in the test set, with MAE and RMSE ranging from 0.0216~0.0381 and 0.0680~0.1049, respectively. The fitting accuracy of RF was slightly better than that of SVR: its R2 values in the training set reached 0.699~0.756 and 0.594~0.651 in the test set. The MAE and RMSE for RF were low at 0.0233~0.0327 and 0.0640~0.0811, respectively. From the perspective of input feature dimensions, model performance showed a regular change with the increase in the number of features. S1 included only pH, Eh, SOM, and T-As; S2 to S4 sequentially added T-Fe and T-Mn, KH2PO4-As, and DGT-As on top of S1. The R2 values of all models in S1 were generally lower than those in S2, S3, and S4. Among these, the models based on the S4 dataset showed the best overall performance, with the test set R2 values of SVR and RF reaching 0.683 and 0.651, respectively.

4. Discussion

4.1. The Paradox of High Soil as but Low Grain Accumulation: Immobilization Mechanisms

As a key factor regulating soil physical and chemical properties, soil pH directly affects the speciation, adsorption capacity, bioavailability, and ecological toxicity threshold of heavy metals in the soil–crop system. Generally, higher soil pH weakens the electrostatic adsorption of anionic metals such as As, making these anionic metals more prone to migration in soil solution or uptake by crops [38]. The speciation of As in the environment is synergistically regulated by soil Eh and pH; combined with the Eh-pH diagram of As, it is indicated that, under the weakly alkaline and weakly reducing soil conditions in the study area, As is likely to exist in the form of HAsO42−, dominated by As(V), which provides a key morphological basis for the subsequent assessment of the impact characteristics of As. As an important active component in soil, SOM interacts with As in the soil and may play a vital role in As migration and bioavailability [39]. The content of As in the study area ranges from 20 mg·kg−1 to 140 mg·kg−1, indicating that the spatial heterogeneity of As in the soil may be associated with the intense disturbance of human activities [40]. Iron–manganese oxides, as widely distributed mineral components in the soil, can promote the transformation of As from mobile fractions to residual fractions through adsorption, complexation and other processes due to their surface-rich functional groups such as hydroxyl and carboxyl groups. This transformation helps reduce the biological activity of As [41,42], providing a mineralogical explanation for describing the bioavailability of As in the study area. DGT-As, as a key indicator reflecting the bioavailable As in the soil, directly demonstrates that the bioavailability of soil As in the study area is relatively low, which aligns with the low content level of HB-As. Furthermore, the content of HB-As is far lower than the limit of 0.5 mg·kg−1 for As in cereal grains specified in the national standard. This indicates that, despite the high As content in the study area, HB-As remains within the safety threshold. In summary, although the soil As content in the study area exceeds the standard, the regulatory effect of iron–manganese oxides has effectively inhibited the migration of As into highland barley, thereby posing no food safety risk to local highland barley production.

4.2. Superiority of Dynamic Bioavailability Assessment (DGT) over Chemical Extraction

In the predictive model for HB-As, the MLR model exhibited poor performance, which may be attributed to its inherent linear assumption that fails to capture the complex nonlinear relationships governing As accumulation in the soil–highland barley system [30]. The migration, transformation, and subsequent uptake and accumulation of As in highland barley result from the combined effects of multiple factors such as pH, Eh, SOM, and T-As, with significant interactions among these variables [28]. However, MLR can only establish simple linear combinations between variables and is unable to capture the complex nonlinear interaction effects mentioned above, resulting in its limited explanatory power for As accumulation in highland barley and consistently low R2 values. Nevertheless, the MLR model showed comparable performance between the training set and the test set, with no overfitting observed, indicating strong model stability. In contrast, SVR and RF achieved excellent fitting results due to their powerful ability to capture nonlinear relationships. Specifically, SVR maps low-dimensional feature spaces to high-dimensional spaces via kernel functions and constructs linear decision boundaries to fit complex nonlinear correlations in the original space, thus overcoming the limitations of linear models. Meanwhile, SVR adheres to the structural risk minimization principle, balancing empirical risk and model complexity constraints to ensure stable generalization ability [43]. As an ensemble model of decision trees, RF reduces the risk of overfitting through ensemble voting of multiple decision trees. It flexibly captures feature interactions by taking advantage of the splitting characteristics of decision trees and reduces sensitivity to noise by randomly selecting features and samples [44,45].
Under different input conditions, S1 included only basic physicochemical factors. Lacking features that directly characterize As bioavailability, it exhibited limited explanatory power and the poorest performance. When T-Fe and T-Mn were added to form S2, the model was able to capture the interactions between Fe-Mn oxides and As, as well as the regulatory effects of these oxides on As bioavailability, leading to a significant improvement in performance. On the basis of S2, S3 further incorporated KH2PO4-As, which enhanced the capacity to characterize As bioavailability and provided the model with more direct predictive information, thus maintaining high performance levels. The addition of DGT-As to generate S4 resulted in the optimal performance. DGT-As can simulate the diffusion process of As toward plant roots, which is more consistent with the actual mechanism of As uptake by highland barley. It not only reflects the content of bioavailable As but also takes its diffusion rate into account, making it a superior indicator for characterizing As bioavailability.

4.3. Deciphering Factor Influence: The Power of ML Interpretation (SHAP and PDP)

To further explore the contribution of each input feature to HB-As, the main control factors were determined by feature importance analysis, and the contribution direction of each factor was analyzed by SHAP. The feature ranking and weight corresponding to the optimal model are shown in Figure 6. In general, the main influencing factors of HB-As exhibit a progressive dominant order from T-As to T-Mn to DGT-As as the richness of input features increases, with DGT-As being the most critical factor for predicting HB-As. The S4 input combination, which includes characteristic parameters such as T-Fe and T-Mn, more accurately predicts the accumulation of HB-As using the RF model.
In the S1 group, T-As exhibited the highest mean absolute SHAP value, accounting for 65.4% of the total contribution, and exerted a positive effect on the model output, indicating that soil T-As is the fundamental factor driving HB-As. In contrast, pH, Eh, and SOM mainly influence arsenic bioavailability indirectly by regulating the adsorption–desorption processes of As [46,47]. In the S2 group, the contributions of T-Mn and T-Fe to the model were second only to that of T-As, and together these three factors accounted for 86% of the total contribution. This indicates that manganese oxides can act as non-biological oxidants in the geochemical behavior of arsenic, oxidizing As(III) to As(V), thereby reducing arsenic toxicity and enhancing its adsorption onto iron oxides [41,42,48]. Interestingly, the main influencing factors of the S3 group did not change due to the addition of KH2PO4-As. In the S4 combination including DGT-As, the contribution of DGT-As exceeded that of T-As, accounting for 30.5% of the total contribution and exerting a positive effect on the model output. This indicates that DGT-As has the most direct influence on the model predictions, which is consistent with its role in characterizing arsenic bioavailability [49]. Therefore, the As extracted by DGT technology as a method for predicting As accumulation in crops is significantly more effective than the traditional chemical extraction method or simple total amount fitting.
The nonlinear interactions between DGT-As and T-As, T-Fe, and T-Mn based on SHAP analysis are illustrated in Figure 7. The results indicate that although DGT-As is the dominant predictor of HB-As, its predictive strength is jointly constrained by soil As reserves and the geochemical background of Fe and Mn oxides.
As shown in Figure 7A, the SHAP interaction between DGT-As and T-As demonstrates that the positive contribution of DGT-As to HB-As prediction is significantly enhanced under higher T-As conditions, whereas its marginal effect is relatively limited at low T-As levels. This suggests that the model treats soil As reserves as an important background condition for the predictive effectiveness of DGT-As, highlighting the regulatory role of total soil As in modulating the representation of crop As uptake risk by bioavailability indicators. As shown in Figure 7B,C, the positive predictive contribution of DGT-As to HB-As is more pronounced under low to moderate iron and manganese backgrounds, while the corresponding SHAP values tend to converge or locally weaken at higher T-Fe or T-Mn levels. This indicates that iron and manganese oxides may influence the speciation and activity of As in soils through adsorption- and redox-related processes, thereby weakening the direct indicative capacity of DGT-As for plant-available As and leading to a clear condition-dependent contribution in the model predictions. Overall, these results provide strong model-based evidence supporting the necessity of multi-factor machine learning approaches for characterizing nonlinear interactions in complex soil–crop systems.
The PDP and linear fitting results jointly revealed the marginal effects of different soil physical and chemical characteristics on the prediction of HB-As, as well as the differences in their independent interpretation abilities. The results are shown in Figure 8. The contribution of soil characteristics to the prediction of HB-As was significantly differentiated. T-As, T-Fe and DGT-As were the key factors driving the prediction of HB-As, while the effects of pH, Eh, SOM and KH2PO4-As were mainly indirect or synergistic, with weak independent interpretive abilities. The PDP curves of T-As, T-Fe, and DGT-As showed significant marginal effects, displaying a clear monotonic correlation trend (T-As was positively correlated with HB-As, T-Fe was negatively correlated with HB-As, and DGT-As was positively correlated with HB-As). The corresponding linear fitting R2 was 0.087, 0.123, and 0.150, respectively, indicating that, within the model prediction framework, these three features exert relatively stable marginal effects on HB-As across their respective ranges, thereby reflecting a comparatively strong independent contribution to the prediction results. From the perspective of the mechanism of action, T-As is the repository of HB-As and provides the material basis for HB-As. DGT-As directly characterizes the bioavailability of As and is directly related to HB-As [50]. Iron oxides have strong adsorption and fixation capacity for As, which can affect its bioavailability by regulating the speciation of As [51]. Together, these three factors constitute the core driving elements of the model’s prediction, which is consistent with the SHAP interpretation results. In contrast, the PDP curves of soil pH, Eh, SOM, and KH2PO4-As exhibit relatively small overall variations, and the corresponding linear fitting R2 values are all below 0.003, indicating that the marginal effect of these four characteristics on HB-As was extremely weak, with limited independent interpretive ability. Regarding the indirect influencing factors of HB-As, pH and Eh indirectly regulate the morphology and activity of As by changing the surface charge and adsorption sites of iron–manganese oxides. SOM affects the immobilization efficiency of As by competing with As for the adsorption sites on the surface of iron–manganese oxides. However, the independent contribution of KH2PO4-As was masked by the dominant role of DGT-As, resulting in the failure to effectively reflect its predictive contribution to HB-As.

4.4. Limitations and Future Perspectives

This study developed a predictive model for HB-As accumulation by integrating soil physicochemical properties and As bioavailability indicators. The results demonstrate the dominant roles of soil As reserves and bioavailability in crop As uptake, and key risk-controlling factors were identified through SHAP and PDP analyses, providing a scientific basis for arsenic risk assessment and sustainable agricultural management in high-altitude farmlands.
Nevertheless, several limitations exist in this study. The machine learning models were constructed based on samples from a single region, and the potential spatial autocorrelation among sampling sites may have introduced bias in model performance evaluation. Therefore, future research should conduct cross-seasonal and cross-regional external validations to systematically assess model robustness and generalizability, thereby providing more reliable scientific support for farmland environmental management, safe soil utilization, and sustainable agricultural development.

5. Conclusions

This study provides an integrated assessment of As behavior in the soil–highland barley system by jointly examining soil physicochemical properties, As speciation, bioavailability indicators, and crop accumulation, revealing pronounced spatial heterogeneity and localized high-As soil zones despite generally low As levels in highland barley. Among the three models evaluated (MLR, SVR, and RF), SVR and RF exhibited superior predictive performance, particularly with increasing input features, whereas MLR showed the weakest accuracy due to its limited ability to capture nonlinear interactions. By coupling machine learning models with SHAP and PDP, DGT-As, T-As, and T-Mn/Fe were identified as the dominant drivers of HB-As, demonstrating the effectiveness of data-driven methods in capturing complex nonlinear processes and supporting soil As risk assessment and sustainable agricultural management.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/su18041782/s1, Figure S1: Residual Diagnostic Plots and Ablation Experiment Results of the MLR Model for Robustness Verification; Figure S2: Residual Diagnostic Plots and Ablation Experiment Results of the SVR Model for Robustness Verification; Figure S3: Residual Diagnostic Plots and Ablation Experiment Results of the RF Model for Robustness Verification; Table S1: Feature Selection and Core Parameters for the MLR Model; Table S2: Hyperparameter Settings for the SVR Mode; Table S3: Hyperparameter Settings for the RF Model.

Author Contributions

Conceptualization, J.Z., C.Z., Y.H. and Y.Z.; Methodology, J.Z., Y.C. and Y.Z.; Software, C.Z.; Validation, X.L.; Formal analysis, J.Z.; Investigation, J.Z., Y.L. and Y.H.; Data curation, X.L. and Y.C.; Writing—original draft, J.Z.; editing, J.Z., Y.L., Y.H. and Y.Z.; Visualization, C.Z.; Supervision, Y.L. and Y.Z.; Project administration, X.L. and Y.C.; Funding acquisition, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China, grant number 2021YFD2000203.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Acknowledgments

Thanks to the hard-working editors and valuable comments from reviewers.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. de Conti, A.; Madia, F.; Schubauer-Berigan, M.K.; Benbrahim-Tallaa, L. Carcinogenicity of some metals evaluated by the A synopsis of the evaluations of arsenic, cadmium, cobalt, and antimony. Toxicol. Appl. Pharmacol. 2025, 504, 117506. [Google Scholar] [CrossRef]
  2. Zhang, R.; Yang, Y.; Deji, Y.; Li, H.; Li, Y.; Nima, C.; Zhao, S.; Gong, H. Factors influencing the spatial distribution and individual variation in urinary fluoride levels in Tibet, China. Chemosphere 2023, 326, 138493. [Google Scholar] [CrossRef]
  3. Shi, W.; Qiao, F.W.; Zhou, L. Identification of Ecological Risk Zoning on Qinghai-Tibet Plateau from the Perspective of Ecosystem Service Supply and Demand. Sustainability 2021, 13, 5366. [Google Scholar] [CrossRef]
  4. Yu, C.R.; Sun, Y.F.; Zhong, X.Y.; Yu, Z.B.; Li, X.Y.; Yi, P.; Jin, H.J.; Luo, D.L. Arsenic in permafrost-affected rivers and lakes of Tibetan Plateau, China. Environ. Pollut. Bioavailab. 2019, 31, 226–232. [Google Scholar] [CrossRef]
  5. Xiao, F.J.; Cui, X.M.; Zhao, Y.Z.; Fu, J.J.; Yu, T.; Bu, D.; Zhang, Q.Y. Concentration, spatial distribution, and source apportionment of heavy metals in agricultural soils from the Yarlung Zangbo River Basin, Tibetan Plateau. Environ. Earth Sci. 2023, 82, 577. [Google Scholar] [CrossRef]
  6. Zhang, X.P.P.; Deng, W.; Yang, X.M.M. The background concentrations of 13 soil trace elements and their relationships to parent materials and vegetation in Xizang (Tibet), China. J. Asian Earth Sci. 2002, 21, 167–174. [Google Scholar] [CrossRef]
  7. Yin, Y.Y.; Wang, J.A.; Leng, G.Y.; Zhao, J.T.; Wang, L.; Ma, W.D. Future potential distribution and expansion trends of highland barley under climate change in the Qinghai-Tibet plateau (QTP). Ecol. Indic. 2022, 136, 108702. [Google Scholar] [CrossRef]
  8. Ma, W.D.; Jia, W.; Zhou, Y.T.; Liu, F.G.; Wang, J.A. Prediction of Suitable Future Natural Areas for Highland Barley on the Qinghai-Tibet Plateau under Representative Concentration Pathways (RCPs). Sustainability 2022, 14, 6617. [Google Scholar] [CrossRef]
  9. Zeng, F.R.; Ali, S.; Zhang, H.T.; Ouyang, Y.B.; Qiu, B.Y.; Wu, F.B.; Zhang, G.P. The influence of pH and organic matter content in paddy soil on heavy metal availability and their uptake by rice plants. Environ. Pollut. 2011, 159, 84–91. [Google Scholar] [CrossRef]
  10. Antoniadis, V.; Robinson, J.S.; Alloway, B.J. Effects of short-term pH fluctuations on cadmium, nickel, lead, and zinc availability to ryegrass in a sewage sludge-amended field. Chemosphere 2008, 71, 759–764. [Google Scholar] [CrossRef]
  11. Huang, R.Q.; Gao, S.F.; Wang, W.L.; Staunton, S.; Wang, G. Soil arsenic availability and the transfer of soil arsenic to crops in suburban areas in Fujian Province, southeast China. Sci. Total Environ. 2006, 368, 531–541. [Google Scholar] [CrossRef]
  12. Dai, Y.C.; Lv, J.L.; Liu, K.; Zhao, X.Y.; Cao, Y.F. Major controlling factors and prediction models for arsenic uptake from soil to wheat plants. Ecotoxicol. Environ. Saf. 2016, 130, 256–262. [Google Scholar] [CrossRef]
  13. Majumder, S.; Banik, P. Insights into the comparison of machine learning models on rice grain arsenic prediction: Interplay of rice cultivation systems and soil environmental factors. Environ. Pollut. 2025, 381, 126646. [Google Scholar] [CrossRef] [PubMed]
  14. Hu, B.F.; Xue, J.; Zhou, Y.; Shao, S.; Fu, Z.Y.; Li, Y.; Chen, S.C.; Qi, L.; Shi, Z. Modelling bioaccumulation of heavy metals in soil-crop ecosystems and identifying its controlling factors using machine learning. Environ. Pollut. 2020, 262, 114308. [Google Scholar] [CrossRef]
  15. Lu, X.S.; Sun, L.; Zhang, Y.; Du, J.Y.; Wang, G.Q.; Huang, X.H.; Li, X.Z.; Wang, X.Z. Predicting Cd accumulation in crops and identifying nonlinear effects of multiple environmental factors based on machine learning models. Sci. Total Environ. 2024, 951, 175787. [Google Scholar] [CrossRef] [PubMed]
  16. Palansooriya, K.N.; Li, J.; Dissanayake, P.D.; Suvarna, M.; Li, L.; Yuan, X.; Sarkar, B.; Tsang, D.C.W.; Rinklebe, J.; Wang, X.; et al. Prediction of Soil Heavy Metal Immobilization by Biochar Using Machine Learning. Environ. Sci. Technol. 2022, 56, 4187–4198. [Google Scholar] [CrossRef]
  17. Yang, H.; Huang, K.; Zhang, K.; Weng, Q.; Zhang, H.; Wang, F. Predicting Heavy Metal Adsorption on Soil with Machine Learning and Mapping Global Distribution of Soil Adsorption Capacities. Environ. Sci. Technol. 2021, 55, 14316–14328. [Google Scholar] [CrossRef]
  18. Wang, Z.; Zhang, C.C.; Lv, D.L.; Zhang, Q.; Yin, P.; Guo, R.Z.; Chen, Q.S.; Cai, Y.M.; Zhao, Y.J. Diffusive gradient in thin films combined with machine learning to discern the accumulation characteristics and driving factors of Cd and Cu in soil-rice systems. J. Hazard. Mater. 2025, 495, 138924. [Google Scholar] [CrossRef]
  19. Yang, Y.P.; Wang, P.; Yan, H.J.; Zhang, H.M.; Cheng, W.D.; Duan, G.L.; Zhu, Y.G. NH4H2PO4-extractable arsenic provides a reliable predictor for arsenic accumulation and speciation in pepper fruits (Capsicum annum L.). Environ. Pollut. 2019, 251, 651–658. [Google Scholar] [CrossRef]
  20. Gao, G.; Zhang, C.; Zuo, J.; Sun, J.; Liu, W.; Wang, Z.; Zhao, Y. Development and application of a layered double hydroxides binding layer based DGT technique for simultaneous sampling low molecular weight organic acids and arsenic in soil. Talanta 2025, 300, 129255. [Google Scholar] [CrossRef] [PubMed]
  21. Ke, B.; Nguyen, H.; Bui, X.N.; Bui, H.B.; Choi, Y.; Zhou, J.; Moayedi, H.; Costache, R.; Nguyen-Trang, T. Predicting the sorption efficiency of heavy metal based on the biochar characteristics, metal sources, and environmental conditions using various novel hybrid machine learning models. Chemosphere 2021, 276, 130204. [Google Scholar] [CrossRef]
  22. Jia, X.L.; Hu, B.F.; Marchant, B.; Zhou, L.Q.; Shi, Z.; Zhu, Y.W. A methodological framework for identifying potential sources of soil heavy metal pollution based on machine learning: A case study in the Yangtze Delta, China. Environ. Pollut. 2019, 250, 601–609. [Google Scholar] [CrossRef]
  23. Guo, G.M.; Lin, L.Y.; Jin, F.M.; Masek, O.; Huang, Q. Application of heavy metal immobilization in soil by biochar using machine learning. Environ. Res. 2023, 231, 116098. [Google Scholar] [CrossRef] [PubMed]
  24. Li, H.M.; Wang, J.H.; Wang, Q.G.; Tian, C.H.; Qian, X.; Leng, X.Z. Magnetic Properties as a Proxy for Predicting Fine-Particle-Bound Heavy Metals in a Support Vector Machine Approach. Environ. Sci. Technol. 2017, 51, 6927–6935. [Google Scholar] [CrossRef] [PubMed]
  25. Zou, Z.Y.; Wang, Q.L.; Wu, Q.S.; Li, M.H.; Zhen, J.B.; Yuan, D.Y.; Zhou, M.; Xu, C.; Wang, Y.C.; Zhao, Y.P.; et al. Inversion of heavy metal content in soil using hyperspectral characteristic bands-based machine learning method. J. Environ. Manag. 2024, 355, 120503. [Google Scholar] [CrossRef]
  26. Peng, Y.Z.; Yu, G.I. Model multifactor analysis of soil heavy metal pollution on plant germination in Southeast Chengdu, China: Based on redundancy analysis, factor detector, and XGBoost-SHAP. Sci. Total Environ. 2024, 954, 176605. [Google Scholar] [CrossRef]
  27. Ling, S.X.; Li, X.N.; Zhang, S.M.; Wei, W.; Liu, Z.X.; Feng, J.J.; Wang, J.W.; Sun, C.W.; Zhang, G. Machine and deep learning-driven geospatial prediction modelling of soil cadmium and nickel with pollution risk mapping underlaid by black shale in Qianjiang County, China. J. Hydrol. Reg. Stud. 2025, 62, 102857. [Google Scholar] [CrossRef]
  28. Qiu, Y.F.; Zhou, S.L.; Zhang, C.C.; Qin, W.D.; Lv, C.X.; Zou, M.M. Identification of potentially contaminated areas of soil microplastic based on machine learning: A case study in Taihu Lake region, China. Sci. Total Environ. 2023, 877, 162891. [Google Scholar] [CrossRef]
  29. Li, J.; Pan, L.J.; Suvarna, M.; Tong, Y.W.; Wang, X.N. Fuel properties of hydrochar and pyrochar: Prediction and exploration with machine learning. Appl. Energy 2020, 269, 115166. [Google Scholar] [CrossRef]
  30. Zhao, B.; Zhu, W.X.; Hao, S.F.; Hua, M.; Liao, Q.L.; Jing, Y.; Liu, L.; Gu, X.Y. Prediction heavy metals accumulation risk in rice using machine learning and mapping pollution risk. J. Hazard. Mater. 2023, 448, 130879. [Google Scholar] [CrossRef]
  31. Zhang, Z.C.; Xu, B.; Xu, W.M.; Wang, F.; Gao, J.; Li, Y.; Li, M.; Feng, Y.C.; Shi, G.L. Machine learning combined with the PMF model reveal the synergistic effects of sources and meteorological factors on PM pollution. Environ. Res. 2022, 212, 113322. [Google Scholar] [CrossRef]
  32. Li, W.; Huang, G.; Tang, N.; Lu, P.; Jiang, L.; Lv, J.; Qin, Y.; Lin, Y.; Xu, F.; Lei, D. Effects of heavy metal exposure on hypertension: A machine learning modeling approach. Chemosphere 2023, 337, 139435. [Google Scholar] [CrossRef] [PubMed]
  33. Li, R.X.; Zhang, R.; Yang, Y.; Li, Y.H. Accumulation characteristics, driving factors, and model prediction of cadmium in soil-highland barley system on the Tibetan Plateau. J. Hazard. Mater. 2023, 453, 131407. [Google Scholar] [CrossRef]
  34. Burachevskaya, M.; Minkina, T.; Fedorenko, A.; Fedorenko, G.; Chernikova, N.; Rajput, V.D.; Mandzhieva, S.; Bauer, T. Accumulation, translocation, and toxicity of arsenic in barley grown in contaminated soil. Plant Soil 2021, 467, 91–106. [Google Scholar] [CrossRef]
  35. Xue, Q.; Ran, Y.; Tan, Y.Z.; Peacock, C.L.; Du, H.H. Arsenite and arsenate binding to ferrihydrite organo-mineral coprecipitate: Implications for arsenic mobility and fate in natural environments. Chemosphere 2019, 224, 103–110. [Google Scholar] [CrossRef] [PubMed]
  36. Dai, Y.C.; Nasir, M.; Zhang, Y.L.; Gao, J.K.; Lv, Y.M.; Lv, J.L. Comparison of DGT with traditional extraction methods for assessing arsenic bioavailability to in different soils. Chemosphere 2018, 191, 183–189. [Google Scholar] [CrossRef] [PubMed]
  37. Ye, T.C.; Liu, T.C.; Yi, H.L.; Du, J.J.; Wang, Y.; Xiao, T.F.; Cui, J.L. arsenic immobilization by natural iron (oxyhydr)oxide precipitates in As-contaminated groundwater irrigation canals. J. Environ. Sci. 2025, 153, 143–157. [Google Scholar] [CrossRef]
  38. Hartley, W.; Dickinson, N.M.; Riby, P.; Lepp, N.W. Arsenic mobility in brownfield soils amended with green waste compost or biochar and planted with. Environ. Pollut. 2009, 157, 2654–2662. [Google Scholar] [CrossRef]
  39. Yang, W.C.; Li, J.X.; Nie, K.; Zhao, P.W.; Xia, H.; Li, Q.; Liao, Q.; Li, Q.Z.; Dong, C.H.; Yang, Z.H.; et al. Machine learning-based identification of critical factors for cadmium accumulation in rice grains. Environ. Geochem. Health 2025, 47, 2. [Google Scholar] [CrossRef]
  40. Li, Y.; Yu, Y.L.; Ding, S.Y.; Dai, W.J.; Shi, R.G.; Cui, G.Y.; Li, X.D. Application of machine learning in soil heavy metals pollution assessment in the southeastern Tibetan plateau. Sci. Rep. 2025, 15, 13579. [Google Scholar] [CrossRef]
  41. Pan, Q.X.; Liu, C.; Zhang, C.; Zheng, H.L.; Ding, W.; Li, H. Mn/Fe bimetallic MOF-driven sulfite activation for synergistic non-radical oxidation and chemisorption of arsenite: Involvement of high-valent Fe (IV)/Mn(V) species. J. Hazard. Mater. 2025, 496, 139234. [Google Scholar] [CrossRef]
  42. Zhang, L.Q.; Hu, J.; Li, C.; Chen, Y.Y.; Zheng, L.G.; Ding, D.; Shan, S.F. Synergistic mechanism of iron manganese supported biochar for arsenic remediation and enzyme activity in contaminated soil. J. Environ. Manag. 2023, 347, 119127. [Google Scholar] [CrossRef]
  43. Sun, J.; Li, B.; Liu, Y.; Wu, Z.Q.; Shi, L.; Zhou, X.; Wu, P.C.; Yao, K.S. Detection of composite heavy metal content in rape leaf using feature clustering and hyperspectral imaging technology. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2025, 340, 126387. [Google Scholar] [CrossRef]
  44. Li, X.; Chen, B.; Chen, W.S.; Yin, Y.L.; Huang, L.X.; Wei, L.; Awad, M.; Liu, Z.Z. Predictive Machine Learning Model to Assess the Adsorption Efficiency of Biochar-Heavy Metals for Effective Remediation of Soil-Plant Environment. Toxics 2024, 12, 575. [Google Scholar] [CrossRef] [PubMed]
  45. Mishra, D.; Chakrabortty, R.; Sen, K.; Pal, S.C.; Mondal, N.K. Groundwater vulnerability assessment of elevated arsenic in Gangetic plain of West Bengal, India; Using primary information, lithological transport, state-of-the-art approaches. J. Contam. Hydrol. 2023, 256, 104195. [Google Scholar] [CrossRef] [PubMed]
  46. Honma, T.; Ohba, H.; Kaneko-Kadokura, A.; Makino, T.; Nakamura, K.; Katou, H. Optimal Soil Eh, pH, and Water Management for Simultaneously Minimizing Arsenic and Cadmium Concentrations in Rice Grains. Environ. Sci. Technol. 2016, 50, 4178–4185. [Google Scholar] [CrossRef] [PubMed]
  47. Norton, G.J.; Adomako, E.E.; Deacon, C.M.; Carey, A.M.; Price, A.H.; Meharg, A.A. Effect of organic matter amendment, arsenic amendment and water management regime on rice grain arsenic species. Environ. Pollut. 2013, 177, 38–47. [Google Scholar] [CrossRef]
  48. Fischel, M.H.H.; Clarke, C.E.; Sparks, D.L. Arsenic sorption and oxidation by natural manganese-oxide-enriched soils: Reaction kinetics respond to varying environmental conditions. Geoderma 2024, 441, 116715. [Google Scholar] [CrossRef]
  49. Wang, J.J.; Bai, L.Y.; Zeng, X.B.; Su, S.M.; Wang, Y.N.; Wu, C.X. Assessment of arsenic availability in soils using the diffusive gradients in thin films (DGT) technique-a comparison study of DGT and classic extraction methods. Environ. Sci. Process. Impacts 2014, 16, 2355–2361. [Google Scholar] [CrossRef]
  50. Chen, R.; Cheng, N.U.; Ding, G.Y.; Ren, F.M.; Lv, J.G.; Shi, R.G. Predictive model for cadmium uptake by maize and rice grains on the basis of bioconcentration factor and the diffusive gradients in thin-films technique. Environ. Pollut. 2021, 289, 117841. [Google Scholar] [CrossRef]
  51. Faria, M.C.S.; Rosemberg, R.S.; Bomfeti, C.A.; Monteiro, D.S.; Barbosa, F.; Oliveira, L.C.A.; Rodriguez, M.; Pereira, M.C.; Rodrigues, J.L. Arsenic removal from contaminated water by ultrafine δ-FeOOH adsorbents. Chem. Eng. J. 2014, 237, 47–54. [Google Scholar] [CrossRef]
Figure 1. Overview of the study area and distribution of sampling sites. (a) Schematic map showing the location of the study area in China. (b) Location of the study area in Xizang Autonomous Region and its topographic characteristics. (c) Distribution of specific sampling sites.
Figure 1. Overview of the study area and distribution of sampling sites. (a) Schematic map showing the location of the study area in China. (b) Location of the study area in Xizang Autonomous Region and its topographic characteristics. (c) Distribution of specific sampling sites.
Sustainability 18 01782 g001
Figure 2. Model development process.
Figure 2. Model development process.
Sustainability 18 01782 g002
Figure 3. Soil properties and As concentration in highland barley of study area: (A) pH; (B) Eh; (C) SOM; (D) T-As; (E) T-Fe; (F) T-Mn; (G) KH2PO4-As; (H) DGT-As; (I) HB-As. (DGT-As is expressed in μg·L−1 and reflects the bioavailable As concentration in soil pore water as well as its dynamic resupply capacity to the plant rhizosphere. Yellow dots are high-value outliers of each indicator, indicating significantly higher content in some samples).
Figure 3. Soil properties and As concentration in highland barley of study area: (A) pH; (B) Eh; (C) SOM; (D) T-As; (E) T-Fe; (F) T-Mn; (G) KH2PO4-As; (H) DGT-As; (I) HB-As. (DGT-As is expressed in μg·L−1 and reflects the bioavailable As concentration in soil pore water as well as its dynamic resupply capacity to the plant rhizosphere. Yellow dots are high-value outliers of each indicator, indicating significantly higher content in some samples).
Sustainability 18 01782 g003
Figure 4. Pearson correlations between soil properties and As accumulation in highland barley (* presents p < 0.05; ** presents p < 0.01).
Figure 4. Pearson correlations between soil properties and As accumulation in highland barley (* presents p < 0.05; ** presents p < 0.01).
Sustainability 18 01782 g004
Figure 5. Under the condition of S4: (A) MLR, (B) SVR, and (C) RF prediction model.
Figure 5. Under the condition of S4: (A) MLR, (B) SVR, and (C) RF prediction model.
Sustainability 18 01782 g005
Figure 6. In the RF prediction model, the importance diagram and SHAP diagram of each characteristic parameter for the absorption of As by highland barley: (A) S1; (B) S2; (C) S3; (D) S4.
Figure 6. In the RF prediction model, the importance diagram and SHAP diagram of each characteristic parameter for the absorption of As by highland barley: (A) S1; (B) S2; (C) S3; (D) S4.
Sustainability 18 01782 g006
Figure 7. Interaction effects of DGT-As and key controlling factors on HB-As prediction: (A) DGT-As+T-As; (B) DGT-As+T-Fe; (C) DGT-As+T-Mn.
Figure 7. Interaction effects of DGT-As and key controlling factors on HB-As prediction: (A) DGT-As+T-As; (B) DGT-As+T-Fe; (C) DGT-As+T-Mn.
Sustainability 18 01782 g007
Figure 8. Single-feature PDP illustrating the relationships between soil properties and HB-As. (Panels (AH) represent the effects of pH (A), Eh (B), SOM (C), T-As (D), T-Fe (E), T-Mn (F), KH2PO4-As (G), and DGT-As (H) on HB-As, respectively. Blue scatter points indicate observed HB-As values, the black solid line represents the linear regression trend, the blue shaded area denotes the 95% confidence interval, and the orange curve shows the PDP-derived marginal effect of each individual feature on HB-As while averaging out the effects of other variables).
Figure 8. Single-feature PDP illustrating the relationships between soil properties and HB-As. (Panels (AH) represent the effects of pH (A), Eh (B), SOM (C), T-As (D), T-Fe (E), T-Mn (F), KH2PO4-As (G), and DGT-As (H) on HB-As, respectively. Blue scatter points indicate observed HB-As values, the black solid line represents the linear regression trend, the blue shaded area denotes the 95% confidence interval, and the orange curve shows the PDP-derived marginal effect of each individual feature on HB-As while averaging out the effects of other variables).
Sustainability 18 01782 g008
Table 1. The fitting performance of different models on the training set and test set under different input feature conditions.
Table 1. The fitting performance of different models on the training set and test set under different input feature conditions.
ModelDatasetMAERMSER2
S1MLRTraining0.05640.12420.254
Test0.05780.12180.168
SVRTraining0.03460.09970.587
Test0.03810.10490.541
RFTraining0.02330.06500.699
Test0.02670.07530.594
S2MLRTraining0.05880.12120.385
Test0.06070.12110.309
SVRTraining0.02550.07240.737
Test0.03060.07880.684
RFTraining0.02780.06870.736
Test0.03270.08110.623
S3MLRTraining0.05560.11690.344
Test0.05760.11710.223
SVRTraining0.02500.07260.723
Test0.03000.08000.660
RFTraining0.02610.06480.743
Test0.03230.07900.611
S4MLRTraining0.05960.12040.364
Test0.06210.12170.238
SVRTraining0.02160.06800.718
Test0.02640.07000.683
RFTraining0.02740.06400.756
Test0.03250.07600.651
S1: pH, Eh, SOM, and T-As; S2: pH, Eh, SOM, T-As, T-Fe, and T-Mn; S3: pH, Eh, SOM, T-As, T-Fe, T-Mn, and KH2PO4-As; S4: pH, Eh, SOM, T-As, T-Fe, T-Mn, KH2PO4-As, and DGT-As.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zuo, J.; Zhang, C.; Liang, X.; Cai, Y.; Li, Y.; Hu, Y.; Zhao, Y. Machine Learning-Based Analysis of Arsenic Migration from Soil to Highland Barley in High Geological Background Areas. Sustainability 2026, 18, 1782. https://doi.org/10.3390/su18041782

AMA Style

Zuo J, Zhang C, Liang X, Cai Y, Li Y, Hu Y, Zhao Y. Machine Learning-Based Analysis of Arsenic Migration from Soil to Highland Barley in High Geological Background Areas. Sustainability. 2026; 18(4):1782. https://doi.org/10.3390/su18041782

Chicago/Turabian Style

Zuo, Jiahui, Chuangchuang Zhang, Xuefeng Liang, Yanming Cai, Ye Li, Yandi Hu, and Yujie Zhao. 2026. "Machine Learning-Based Analysis of Arsenic Migration from Soil to Highland Barley in High Geological Background Areas" Sustainability 18, no. 4: 1782. https://doi.org/10.3390/su18041782

APA Style

Zuo, J., Zhang, C., Liang, X., Cai, Y., Li, Y., Hu, Y., & Zhao, Y. (2026). Machine Learning-Based Analysis of Arsenic Migration from Soil to Highland Barley in High Geological Background Areas. Sustainability, 18(4), 1782. https://doi.org/10.3390/su18041782

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop