Machine Learning vs. Langmuir: A Multioutput XGBoost Regressor Better Captures Soil Phosphorus Adsorption Dynamics

Iatrou, Miltiadis; Papadopoulos, Aristotelis

doi:10.3390/crops5040055

Open AccessArticle

Machine Learning vs. Langmuir: A Multioutput XGBoost Regressor Better Captures Soil Phosphorus Adsorption Dynamics

by

Miltiadis Iatrou

^*

and

Aristotelis Papadopoulos

Soil and Water Resources Institute, Hellenic Agricultural Organization DIMITRA, Thermi, 57001 Thessaloniki, Greece

^*

Author to whom correspondence should be addressed.

Crops 2025, 5(4), 55; https://doi.org/10.3390/crops5040055

Submission received: 1 July 2025 / Revised: 23 July 2025 / Accepted: 11 August 2025 / Published: 13 August 2025

Download

Browse Figures

Review Reports Versions Notes

Abstract

Accurate prediction of soil phosphorus (P) adsorption capacity is essential for efficient fertilizer management and environmental protection. Traditional isotherm models, such as the Langmuir equation, have been widely used to quantify P sorption, but they do not adequately capture the nonlinear and multivariate nature of soil systems. This study evaluates the performance of a multi-output XGBoost regression model trained on laboratory-measured P adsorption data from 147 soils, representing a wide range of textures, pH levels, and CaCO₃ contents. The model was developed to simultaneously predict P adsorption at five different equilibrium concentrations (1, 2, 4, 6, and 10 mg/L). SHAP analysis and causal discovery via DirectLiNGAM revealed that initial Olsen P concentration and sand content are the primary factors reducing P adsorption. The multi-output XGBoost model was compared against classical Langmuir isotherms using an extended dataset of 10,389 soil samples. The extended dataset was binned into four groups based on Olsen P concentrations and four groups based on sand content. This binning was based on the identification of these variables as highly influential by the XGBoost model, and on their demonstrated causal relationship with soil P sorption capacity through causal inference analysis. The XGBoost model outperformed the Langmuir model in capturing the effect of Olsen P and sand content, as it predicted a 12.6% drop in P adsorption in the very high Olsen P group and a 19.2% drop in the very high sand content groups, which are substantially higher than the reductions estimated by Langmuir isotherms. These results demonstrate that machine learning models, trained on well-designed experimental data, offer a superior alternative to classical isotherms for modeling P sorption dynamics.

Keywords:

phosphorus adsorption; Langmuir isotherm; machine learning; predictive modeling; soil properties; crop phosphorus requirements

1. Introduction

Phosphorus (P) deficiency is among the primary constraints of agricultural crop production [1,2,3]. Most cultivated soils in Greece are calcareous and contain relatively high levels of calcium carbonate (CaCO₃), which gives them a high P sorption capacity [4]. As a result, general fertilizer recommendations may not provide adequate P on some soils, since they do not account for their sorption capacity. General fertilizer recommendations are typically based on standard P application rates for each crop, or in more advanced cases, adjusted using Olsen P levels, but they often do not account for the soil’s P sorption capacity. As a result, such generalized guidelines may underperform in soils with high fixation potential, where a significant portion of the applied P becomes unavailable to plants.

P fertilizers are expensive and their uptake efficiency by crops is often limited, typically ranging between 10–30% [5]. Even though soil may contain thousands of kilograms of P per hectare, much of this may not be available to crops [6]. Oehl et al. [7] reported that total P in the topsoil (0–20 cm) can range from approximately 1400 to over 2000 kg P ha⁻¹. However, only a small fraction of this total P is plant-available, as much of it is strongly bound within the soil matrix due to sorption and fixation processes. Fertilizer P rapidly reacts with the soil constituents, forming less bioavailable complexes, particularly in soils with pH levels below 5.0 or in highly alkaline conditions [8]. Given this low efficiency and high cost, it is essential to optimize P application based on knowledge obtained by quantitative models according to diverse soil properties.

On the other hand, excessive P fertilization can be highly detrimental to the environment. When P is applied in quantities exceeding crop demand, it can lead to contamination of aquatic ecosystems through soil erosion or surface runoff [9]. Elevated P levels in water bodies contribute to eutrophication, which promotes unnaturally dense algal growth, which is commonly known as algal blooms [10]. These blooms can disrupt aquatic ecosystems and pose risks to human and animal health, as some algal species produce harmful toxins. Additionally, eutrophication can produce unpleasant odors due to the decomposition of algae and degrade water quality, impairing its use not only for irrigation and drinking but also for recreational purposes. Such environmental risks underscore the need for reducing P losses from agricultural fields using predictive models that can accurately estimate soil P dynamics, enabling more targeted fertilizer use and reducing excess P applications [11].

To assess the P sorption capacity of soils, many researchers have recommended the use of sorption isotherm techniques [3,12,13,14,15,16,17] (Appendix A, Table A1). Many models have been developed to describe this process quantitatively, with the Langmuir and Freundlich equations being the most widely used [18,19,20,21]. However, real soil systems are inherently heterogeneous, both chemically and physically, which limits the applicability of these models in complex field environments. First, their underlying assumptions, such as homogeneity of adsorption sites and linearity or semi-empirical fitting, fail to capture the nonlinear and multivariate nature of actual soil P adsorption processes [20,21]. Second, these models do not effectively integrate the interactive effects of multiple soil properties (e.g., texture, pH, CaCO₃, organic matter), which collectively influence P dynamics. Third, their fixed-parameter structure constrains their predictive flexibility across diverse environmental contexts, making them less suited for data-driven or site-specific fertilizer management applications [9,19]. For the current study, two main approaches have been used to model P sorption capacity of soils: The Langmuir adsorption isotherm and a data driven approach, which is based on a Multioutput XGBoost model.

2. Materials and Methods

2.1. Laboratory Analysis

A total of 147 surface soil samples (0–15 cm) were selected from a larger soil database maintained by the Soil and Water Resources Institute (SWRI). The selection of the 147 soil samples aimed to ensure a broad representation of physicochemical variability found in Greek soils, particularly across different soil textures, pH levels, and calcium carbonate (CaCO₃) contents typical of Greek agricultural soils (Table 1). The samples were taken from 0–15 cm depth and then air-dried, ground, homogenized, and sieved to pass through a <2-mm sieve after removing stones and residual roots. The soil samples were analyzed for a set of 18 soil properties: sand%, clay%, silt%, bulk density, soil acidity (pH), electrical conductivity (EC), organic matter (OM), CaCO₃, nitrate nitrogen, phosphorus (P), potassium (K), magnesium (Mg), iron (Fe), zinc (Zn), manganese (Mn), copper (Cu), boron (B), and calcium (Ca). The pH and EC values were measured using the saturated paste method with specific meters [22]. Nitrate–nitrogen was extracted using the 2M potassium chloride (KCl) extraction method quantified via the UV-VIS spectrophotometric procedure [23], while organic matter was determined through the Walkley–Black method [24]. The CaCO₃ content was assessed through titration [25], and the soil texture was evaluated using the Bouyoucos hydrometer approach [26]. The Olsen method was employed for P [8]. The ammonium acetate method was used for the extraction of Na, K, Ca, and Mg, and these elements were quantified using Inductively Coupled Plasma Spectroscopy (ICP) [27]. Additionally, Mn, Cu, Fe, and Zn were extracted using DTPA and quantified with ICP [28]. Soil B was extracted using the hot 0.02 M CaCl₂ method, which involves boiling the soil solution for 5 min, and the extracted B was measured using ICP [29].

The soils were classified into four groups based on combinations of pH, CaCO₃ content, and texture (Table 2), as these factors are among the most influential in determining P sorption behavior, as reported in previous studies [3,13,14].

2.2. P Sorption Capacity

P adsorption experiment was performed for these soil samples. Each sample (3 g) was placed into a 50-mL centrifuge tube and then was equilibrated with 30 mL of Ca₂PO₄ at different concentrations (1, 2, 4, 6 and 10 mg/L). These P concentrations were selected because they are more likely to be encountered in natural agronomic conditions, corresponding approximately to field applications of 50, 100, 200, 300, and 500 kg P ha⁻¹. Although higher P concentrations have often been used in previous adsorption studies [30], the selected range better reflects realistic P application rates used in commercial agriculture. The suspension was added with three drops of chloroform to avoid microbial growth and horizontally shaken for 24 h and then centrifuged at 4000 r/min for 30 min producing a clear supernatant [31]. The supernatant was filtered through 0.45 μM filters. P in the solutions was measured by ICP. Sorbed P was calculated using the difference between P in the initial solution and P in the equilibrium solution and mass of soil. Sorption isotherms were constructed by plotting sorbed P (mg kg⁻¹ of soil P) against P in the equilibrium solution.

2.3. Machine Learning

To preprocess the P sorption dataset for training a multi-output XGBoost regression model, we constructed a feature-augmented dataset in which equilibrium concentration (Ce) values (1, 2, 4, 6, 10 mg/L) were incorporated as an additional input alongside the unchanging soil property vectors, generating a sequence of 5 rows for each soil sample. Although the soil features remained constant, each Ce level corresponded to a different P adsorption value, allowing the model to learn how adsorption varies with equilibrium concentration for a given soil profile. This resulted in a dataset of 735 rows, where each row combined constant soil features with a specific Ce level, and the corresponding P sorption value served as the prediction target. This formulation enabled feature importance interpretation using SHAP analysis [32]. Notably, this augmented dataset was used solely for feature selection and interpretation purposes and not for training the predictive model. The dataset was randomly split into a training set (80%, 588 samples) and a test set (20%, 147 samples). To reduce dimensionality and eliminate less informative features, we applied Recursive Feature Elimination (RFE) on this augmented dataset with a Random Forest Regressor as the base estimator [33,34]. This procedure was conducted on the training test using a fixed train-test split rather than cross-validation given the moderate dataset size. RFE recursively ranks and removes features based on their importance, ultimately selecting the most predictive subset for model training.

After feature selection, a multi-output XGBoost regression model was implemented to estimate P adsorption across multiple equilibrium concentrations [35,36]. Importantly, this model was not trained on the feature-augmented dataset (735 rows), but rather on the original dataset comprising 147 soil samples, each represented by a single row with five P sorption values corresponding to the Ce levels (1, 2, 4, 6, 10 mg/L) as output targets. This approach was deliberately chosen to avoid overfitting, which could have occurred if the model were trained on the artificially expanded dataset, where each soil sample was repeated across Ce levels, potentially introducing redundant patterns. A multi-output regression model was preferred over fitting separate single-output models for each equilibrium concentration, because it allows the model to simultaneously learn from the shared structure and correlations across all output targets. In the context of P adsorption, the responses at different equilibrium concentrations are not independent, but they are part of P sorption dynamics governed by the same underlying soil properties. By training a unified multi-target model, the learning algorithm can exploit these dependencies to improve overall predictive performance and generalization. In contrast, training separate models for each concentration level would ignore these interrelationships, potentially leading to inconsistent predictions across concentrations. Multi-output learning thus offers statistical advantages, especially when the outputs are naturally linked, as is the case with sorption dynamics across varying P concentrations.

The reduced dataset (with only the selected features) was randomly split into a training set (80%, 117 samples) and a test set (20%, 30 samples). We employed the XGBoost regressor with the multi_strategy = “multi_output_tree” option, enabling native multi-target prediction using a single ensemble of gradient-boosted trees. This architecture was selected for its scalability and high predictive performance on structured data with multiple outputs.

To optimize the model, Optuna was applied [37], which is a Bayesian hyperparameter optimization framework, with the objective of minimizing the Mean Absolute Error (MAE) across all output targets. Optuna was chosen for its efficiency in exploring large hyperparameter spaces compared to traditional methods like grid or random search. The optimal configuration included 257 boosting rounds (n_estimators = 257), a learning rate of 0.137, regularization (gamma = 2.46, min_child_weight = 2), a tree depth of 8 (max_depth = 8), and stochastic subsampling (subsample = 0.858, colsample_bytree = 0.898). These values reflect a balanced trade-off between model complexity and generalization performance, favoring moderately deep trees and regularization to prevent overfitting while retaining the model’s ability to capture nonlinear interactions in the data.

2.4. Causal Discovery

After model training, the Directed Acyclic Graph (DAG) approach was applied on the reduced original dataset (including only the selected features comprising the 147 soil samples) to uncover the causal relationships between variables. For this purpose, the Direct Linear Non-Gaussian Acyclic Model (DirectLiNGAM) was employed. This algorithm is an extension of the original LiNGAM method and offers improved robustness when the assumption of strict non-Gaussianity is not fully met. DirectLiNGAM assumes that the underlying causal structure is linear, acyclic, and free from hidden confounders. While the assumption that the underlying causal structure is linear may limit the model’s ability to capture nonlinear soil and chemical interactions, DirectLiNGAM remains a robust choice for identifying the dominant causal directions in moderate-sized datasets. Additionally, DirectLiNGAM has the advantage of assuming non-Gaussianity in the data, which is critical for enabling the algorithm to detect causal relationships beyond second-order statistical dependencies such as covariance. This allows the model to uncover more complex causal patterns compared to methods based solely on linear Gaussian assumptions. Nonlinear causal discovery algorithms could, in theory, capture richer interactions, but they typically require substantially larger datasets to produce stable and reliable results, which was not feasible in the present study. The DirectLiNGAM algorithm enhances causal discovery by introducing a more reliable method for determining causal order, which involves three main steps: (1) pairwise causality tests, (2) estimation of causal ordering based on those tests, and (3) estimation of connection strengths among variables [38,39,40].

2.5. Comparison of the Langmuir Isotherms and the Multi-Output XGBoost Regressor on a Large Soil Dataset

To assess and compare the generalizability and the predictive accuracy of both the Langmuir isotherms and the multi-output XGBoost model, we applied them to an extended dataset comprising 10,389 soil samples independently collected over recent years by the Soil and Water Resources Institute of Thessaloniki. This dataset includes a broad range of soil physicochemical properties and was used to estimate P adsorption based on the models trained on the original experimental dataset.

To compare mean differences among soil texture groups, Tukey’s HSD (Honestly Significant Difference) test was applied [41]. Tukey’s HSD test was selected as the post-hoc method because it is well suited for identifying statistically significant differences between all possible pairs of group means following ANOVA, particularly when comparing multiple soil types across several equilibrium concentrations. It was used here to assess whether the multi-output XGBoost model and Langmuir isotherms produced statistically distinguishable group-wise predictions between soil textures at each equilibrium concentration (Ce), with a significance threshold of p < 0.05.

The entire process, encompassing data analysis, model development, and visualization, was conducted using Python 3.9 [42]. The differences between means were assumed to be statistically significant at the 5% level. Matplotlib 3.8.1 was used for visualization [43]. The lingam library was used for performing the DirectLiNGAM [38,40]. All data used in this study are publicly available through the Zenodo repository [https://doi.org/10.5281/zenodo.15854383, accessed on 8 August 2025], and all code for preprocessing, analysis, and figure generation is accessible at [https://github.com/Mil-afk/soil_phosphorus_adsorption_data, accessed on 8 August 2025]. Additionally, a Python library developed as part of this work for phosphorus adsorption modeling is available via PyPI at [https://pypi.org/project/phosadsorption/, accessed on 8 August 2025].

3. Results

3.1. Feature Engineering

Feature selection on the feature-augmented dataset using RFE identified nine soil variables as important, as shown in Figure 1. An XGBoost model was then trained on the same dataset, using P adsorption as the target variable, and including the P equilibrium concentration (Ce) as an additional input feature. The feature importance plot, which was generated using SHAP, indicated that, aside from Ce, the most influential features were Olsen P, manganese, sand content, and magnesium concentration in the soil (Figure 1).

The SHAP dependence plot revealed a clear decline in P adsorption with increasing initial Olsen P concentrations, which aligns with agronomic expectations regarding P availability. However, the strength and consistency of this relationship across equilibrium concentrations was notable, especially since, to our knowledge, no previous studies have explicitly quantified this decline using nonlinear machine learning models while simultaneously accounting for other soil variables (Figure 2a). In contrast, P sorption capacity increased with higher manganese availability in the soil (Figure 2b), while it decreased with increasing sand content (Figure 2c). The positive correlation between manganese availability and P sorption capacity can likely be attributed to both indirect and direct mechanisms: manganese availability increases under acidic or reducing conditions, which are also associated with higher P sorption capacity, and Mn oxides themselves are known to directly contribute to P retention in soil through surface adsorption processes [13]. A similar positive trend was observed for magnesium, which may reflect its higher availability in alkaline soils with high calcium carbonate content, which tend to exhibit elevated P sorption (Figure 2d).

3.2. Causal Inference

Causal inference analysis applied on the reduced original dataset confirmed that initial Olsen P concentration and sand content had causal relationship with P adsorption in the soil samples (Figure 3). All DAGs in Figure 3 consistently show sand content as a parent node directly influencing P adsorption. This consistency was confirmed by examining the causal graphs generated separately for each equilibrium concentration (Ce = 1, 2, 4, 6, and 10 mg/L), where both the direction and presence of edges from sand content to P adsorption remained stable across all models. A causal link between Olsen P and P adsorption was also observed for Ce levels of 1, 2, 4, and 6 mg/L. The direction of this relationship was consistent across these concentrations, further supporting the robustness of the inferred causal structure. This finding is consistent with the SHAP-based feature importance plot, where Olsen P (shown as P in Figure 1) and sand content rank among the top predictors of P adsorption. While the causal inference results align well with the SHAP-based feature importance analysis, it is important to note that Olsen P and sand content were identified as the primary causal drivers of P adsorption and ranked among the most influential features in the SHAP-based interpretation. Nonetheless, we acknowledge the limitations of causal discovery from observational data, particularly the assumptions of linearity and the absence of unmeasured confounders required by the DirectLiNGAM algorithm.

3.3. Langmuir Equations

Since sand content emerged as the most significant soil property influencing P sorption capacity, according to both feature importance and causal inference analyses, Langmuir P sorption isotherms were constructed based on soil classification (clay, sand, loam) and soil texture groups (Figure 4a,b). General equations based on soil classification were developed because the original dataset of 147 samples lacked representation from the sandy, sandy clay, and silty texture groups, which are extremely rare in Greek soils. While the fitted Langmuir equations provide reasonable estimates across major soil classes, their applicability to underrepresented texture groups (e.g., sandy, sandy clay, and silty texture groups) may be somewhat limited due to the smaller number of experimental observations. As such, results for these groups should be interpreted with appropriate consideration of this constraint. However, as the extended dataset included some samples from these underrepresented groups, broader Langmuir equations were generated by aggregating samples into three main soil classification groups (sand, clay, and loam), allowing us to estimate P adsorption for all samples through grouped fitting rather than extrapolation from individual texture classes. Figure 4b displays the Langmuir isotherms for the remaining soil texture groups, where it is shown that loamy sand and sandy loam soils have the lowest P adsorptions capacities. This observation is consistent with the feature importance and causal inference analyses, which identified sand content as one of the main drivers reducing P adsorption. Previous studies also have shown that sandy soils typically exhibit lower P adsorption due to their low clay content, iron and aluminum oxides, and limited surface area for sorption reactions [13,44]. The resulting fitted parameters are presented in Table 3 and Table 4. Table 5, which presents P adsorption percentages across soil types, confirms that clay and silty clay loam soils exhibit the highest adsorption, while sandy textures show the lowest (p < 0.05) across all equilibrium concentrations.

3.4. Multiple Linear Regression Equations

For simplicity, and because the Langmuir isotherms are nearly linear over this specific range of equilibrium concentrations, multiple linear regression equations are provided in Table 6. These multiple linear regression equations were derived using the same original dataset of 147 soil samples that was used to fit the Langmuir models. The intention was to provide simplified, empirical alternatives for estimating P adsorption across different soil classifications, based on readily available soil test data. These equations allow soil laboratories to easily estimate P adsorption based on standard soil test data. Multiple linear regression analysis revealed that the overall interaction between soil classification and the predictor variables was statistically significant (p < 0.001).

Although the regression model for sandy soils shows a relatively large positive coefficient for EC, this effect should be interpreted in the context of the overall model structure. Due to the dominant effect of sand content, which is typically high in these soils, and its negative coefficient (−2.4), overall P adsorption remains substantially lower in sandy soils compared to clayey and loamy soils. However, when P retention does occur in sandy soils, it appears to be more strongly influenced by EC, suggesting that soluble salt levels may play a role in modulating P adsorption under otherwise low-retention conditions.

3.5. Multi-Output XGBoost Model Performance

The final multi-output model captured complex, nonlinear relationships between soil features and P adsorption across different Ce levels, achieving an overall MAE of 26.5 mg/kg and an R² score of 0.493 on the test set (Figure 5). An R² of 0.493, while moderate, is considered reasonable given the inherent variability and multivariate nature of soil properties. To the best of our knowledge, there are no previous studies that have validated Langmuir equations on unseen data using an independent test set. Traditional applications of the Langmuir model typically involve fitting to the entire dataset without assessing predictive performance, which limits direct comparison with modern data-driven approaches. While the model showed strong overall agreement between predicted and observed values, its performance was slightly weaker at higher equilibrium concentrations (e.g., Ce = 10 mg/L), as reflected by increased scatter in predictions. This conclusion is supported by the SHAP analysis, which revealed nonlinear trends in key features such as Olsen P, sand content, and manganese (Figure 2), where the effect on P adsorption varied in intensity and direction depending on feature values.

3.6. Performance of the Multi-Output XGBoost Model and Langmuir Isotherms on an Extended Soil Dataset

As noted earlier, the extended dataset comprises 10,389 soil samples independently collected over recent years by the Soil and Water Resources Institute of Thessaloniki. These samples were not modeled or resampled from the original dataset, allowing for a more robust assessment of model generalizability across a broader range of soil conditions. The extended dataset was binned into four categories based on Olsen P concentrations and four categories based on sand content. The beginning was based on quantile-based statistical distribution to ensure approximately equal representation across the extended dataset. Data were binned on Olsen P and sand content because these variables were identified as highly influential by the XGBoost model and demonstrated causal relationship with soil P sorption capacity through causal inference analysis. For each row, P adsorption was estimated using both the Langmuir isotherm and the XGBoost model across five equilibrium concentrations (Ce values: 1, 2, 4, 6, and 10 mg/L), resulting in an augmented dataset of 51,945 rows.

The bins for Olsen P were calculated using quantile-based binning on the extended dataset as follows:

Low P: 1.00–6.83 mg/kg,

Medium P: 6.84–12.89 mg/kg,

High P: 12.90–25.77 mg/kg, and

Very High P: 25.78–360.44 mg/kg.

The bins for sand content were calculated using quantile-based binning on the extended dataset as follows:

Low P: 4–30%,

Medium P: 30–44%,

High P: 44–56%, and

Very High P: 56–94%.

Applying the Langmuir isotherms to the extended dataset revealed a modest decline in P adsorption of 2.3% in the very high Olsen P group compared to the low Olsen P group (p = 0.005) (Figure 6), and an 11.9% reduction in the very high sand content group relative to the low sand group (p < 0.001) (Figure 7). In contrast, the multi-output XGBoost model estimated a substantially greater drop of 12.6% for Olsen P and 19.2% for sand content (both p < 0.001). These results highlight the machine learning model’s enhanced sensitivity to key soil properties, particularly in capturing their nonlinear and interacting effects on P sorption. The Langmuir model’s underestimation is likely due to its simplified functional form, which does not account for such multivariate complexity. These findings underscore the advantage of data-driven approaches like XGBoost in modeling nutrient dynamics across diverse soil conditions. Table 7 presents P adsorption percentages predicted by the multi-output XGBoost model across soil texture classes for the extended dataset. Silty clay loam and silty clay soils generally exhibit among the highest predicted adsorption values, consistent with observations from the original dataset (Table 5), while sandy textures (e.g., sandy loam, sandy) show the lowest, particularly at higher equilibrium concentrations.

4. Discussion

Data presented here show that a multi-output XGBoost model is more responsive to changes in P adsorption compared to the Langmuir isotherms. This is consistent with XGBoost’s well-established performance in high-dimensional regression tasks, as XGBoost is well known for delivering state-of-the-art performance in data analysis tasks [45]. It is particularly effective at capturing nonlinear patterns, handling multivariate data, and minimizing the prediction error [35,46]. While the XGBoost model demonstrated strong responsiveness to key soil properties, its performance depends on the availability of a well-structured, representative dataset, and its complexity may present challenges for interpretability and direct implementation in operational agronomic settings. However, the innovative aspect of the present study lies in the fact that the XGBoost model was trained on P adsorption data generated from a large-scale experiment involving multiple levels of equilibrium P concentrations applied on soils representing a wide range of textures, pH levels, and calcium carbonate content. This comprehensive dataset enabled the development of an XGBoost model that more accurately estimated soil P sorption capacity compared to the traditional Langmuir isotherms.

Initial Olsen P and sand content were identified as the primary factors negatively influencing P adsorption, thereby increasing P availability in the soil solution for plant uptake. This finding was supported by both SHAP analysis and causal inference. The SHAP dependence plot (Figure 2a) shows a sharp decrease in P adsorption as Olsen P increases in the soil, highlighting why Olsen P is the primary driver behind the reduction in the soil’s P adsorption capacity. SHAP analysis also revealed a positive correlation between soil manganese concentration and P adsorption (Figure 2b). The positive association between manganese availability and P sorption capacity is likely explained by a combination of indirect and direct mechanisms. Manganese tends to be more available in acidic or reducing soil environments, which are also characterized by higher P sorption potential. Additionally, Mn oxides are known to directly enhance P retention through surface adsorption processes [13]. A comparable trend was observed for magnesium, which is typically more available in alkaline—calcareous soils, conditions that are also associated with increased P sorption, as illustrated in Figure 2d.

The causal discovery algorithm DirectLiNGAM successfully captured the causal relationships among P adsorption capacity, sand content, and initial Olsen P in the soil. Causal analysis revealed a consistent link between sand content and P adsorption across all P equilibrium concentrations, and a link between Olsen P and P adsorption at 1, 2, 4, and 6 mg/L of P equilibrium concentrations (Figure 3). These findings further support the feature importance results of the XGBoost model, highlighting the key roles of initial Olsen P and sand content in determining the soil’s P adsorption capacity. The absence of a detected causal link between manganese or magnesium with P adsorption capacity, despite their high ranking in the XGBoost feature importance score, is likely due to their indirect relationship with P adsorption in soils.

Many models have been developed to describe adsorption isotherms, but the Langmuir equation is among the most widely used for quantitatively fitting P adsorption isotherms [44,47,48,49]. The results of the current study showed that the Langmuir isotherms adequately described the general shape of the P adsorption curves across soil types, reflecting the expected saturation behavior. However, as shown earlier (Figure 6 and Figure 7), Langmuir isotherms failed to capture the full magnitude of variation in P adsorption, particularly in soils with very high sand content or high initial Olsen P concentrations, where it substantially underestimated the decline in sorption observed in the extended dataset. However, even for the Langmuir isotherms the loamy sand and sandy loam soils exhibited the lowest P adsorption, which is consistent with the importance of sand content identified by the XGBoost feature importance and the causal inference analysis (Figure 4a).

The final multi-output model achieved an overall mean absolute error (MAE) of 26.5 mg/kg and an R² score of 0.50 on the test set, indicating a more consistent and accurate performance at lower equilibrium concentrations (Ce = 1–6 mg/L), which are also more representative of typical soil solution P levels in agronomic conditions. At higher concentrations (e.g., Ce = 10 mg/L), greater prediction scatter was observed, reflecting increased variability (Figure 5).

Data from the extended dataset shows that the multi-output XGBoost model is more responsive to variations in initial Olsen P content and sand content in soils compared to the Langmuir isotherms (Figure 7). The XGBoost model is especially sensitive to Olsen P, as it was trained using Olsen P as an input variable, whereas the Langmuir model does not account for this factor. In addition, the XGBoost model demonstrated greater responsiveness to changes in sand content, indicating overall better performance than the Langmuir isotherms. This result is expected given the flexibility of tree-based models to capture non-linear relationships [50]. However, the novelty of this study lies in incorporating data from diverse soil types during the adsorption experiment, enabling the training of a robust machine learning model that offers an improved solution to the P adsorption prediction problem. The XGBoost model showed high sensitivity to initial Olsen P levels, which was ranked as the most influential feature in the feature importance score, and to sand content. Specifically, P adsorption dropped by 12.6% in the very high Olsen P group compared to the low Olsen P group, which is a substantially stronger effect than observed with the Langmuir model. Similarly, P adsorption decreased by 19.2% in soils with very high sand content compared to those with low sand content, again exceeding the Langmuir model’s sensitivity. These results highlight that machine learning is a more effective approach for predicting soil P adsorption capacity than relying solely on Langmuir isotherms.

The consistency between the experimentally derived adsorption values (Table 5) and the model-predicted values from the extended dataset (Table 7) highlights the robustness of the multi-output XGBoost model in capturing the effect of soil texture on P adsorption. In both cases, silty clay loam and silty clay soils demonstrated the highest adsorption capacities, while sandy soils consistently exhibited the lowest. These results reinforce the dominant role of fine-textured soils in retaining P and validate the model’s potential for reliable prediction in data-sparse environments. The data presented in Table 5 and Table 7 align with the uptake efficiency (10–30%) reported by Liu et al. [5] and the average soil P adsorption capacity of 75% reported by Halliday [51].

5. Conclusions

This study demonstrated that a multi-output XGBoost regression model demonstrated improved performance in capturing P adsorption variability compared to the classical Langmuir isotherm in predicting P adsorption across diverse soil types. By incorporating a wide array of soil physicochemical properties, the XGBoost model captured complex nonlinear patterns and was found to be more responsive to changes in Olsen P and sand content compared to the Langmuir isotherms. The model’s robustness was further supported by SHAP-based feature importance and causal inference analysis, which consistently identified initial Olsen P and sand content as primary drivers of P sorption variability. These findings highlight the advantages of combining laboratory data with modern machine learning techniques for modeling soil nutrient dynamics. Finally, the model’s strong responsiveness to Olsen P and sand content provides a promising tool for P management, with the potential to improve fertilizer use efficiency across diverse soil conditions and reduce environmental risks from over-fertilization. Further validation across diverse soil regions and under field conditions would be useful to confirm the model’s robustness and support its broader operational use in agronomic decision-making.

Author Contributions

Conceptualization, A.P.; methodology, M.I. and A.P.; software, M.I.; validation, M.I.; formal analysis, M.I.; investigation, A.P.; resources, A.P.; data curation, M.I.; writing—original draft preparation, M.I. and A.P.; writing—review and editing, M.I. and A.P.; visualization, M.I.; supervision, A.P.; project administration, A.P.; funding acquisition, A.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets generated and analyzed during the current study are available in the Zenodo repository: https://doi.org/10.5281/zenodo.15854383, accessed on 8 August 2025, All associated code used for analysis and figure generation is also publicly available at https://github.com/Mil-afk/soil_phosphorus_adsorption_data, accessed on 8 August 2025. Additionally, a Python library developed as part of this work for phosphorus adsorption modeling is available on PyPI: https://pypi.org/project/phosadsorption/, accessed on 8 August 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MDPI	Multidisciplinary Digital Publishing Institute
P	Phosphorus
RFE	Recursive Feature Elimination
MAE	Mean Absolute Error
DirectLiNGAM	Direct Linear Non-Gaussian Acyclic Model
DAG	Directed Acyclic Graph

Appendix A

Table A1. Phosphorus Adsorption Modeling using isotherms.

Authors (Year)	Model Used	Key Findings
Olsen & Watanabe (1957) [12]	Langmuir, Freundlich	Introduced Langmuir model to estimate P adsorption maxima in soils.
Helyar et al. (1976) [13]	Langmuir	Studied phosphate adsorption behavior on Al oxides.
Nair et al. (1984) [31]	Standardized Langmuir protocol	Developed a standardized interlaboratory method for determining P sorption using Langmuir modeling.
Barrow (2008) [16]	Freundlich, Mechanistic models	Proposed improved description of sorption curves beyond isotherms.
Hussain et al. (2002) [47]	Langmuir, Freundlich	Compared isotherm performance under saline-sodic conditions.
Del Bubba et al. (2003) [20]	Langmuir	Estimated P adsorption maximum in sand filters using Langmuir.
Heredia & Fernández Cirelli (2007) [9]	Langmuir	Linked high P application to environmental risk based on sorption capacity.
Bolster & Sistani (2009) [14]	Langmuir	Investigated P sorption from animal manures; showed variability depending on manure type and application.
Dossa et al. (2008) [21]	Langmuir, Freundlich	Assessed impact of shrub residues on P sorption and desorption.
Lair et al. (2009) [48]	Langmuir	Investigated P sorption–desorption along soil weathering gradient.
Rossi et al. (2012) [49]	Langmuir (SWAT integration)	Tested Langmuir model performance within SWAT under high P conditions.
Dari et al. (2015) [19]	Langmuir, Freundlich	Proposed simplified method for estimating isotherm parameters.
Mihoub et al. (2016) [3]	Langmuir	Evaluated P sorption in calcareous soils and its role in sustainable P fertilizer management.
Yang et al. (2019) [44]	Freundlich	Showed influence of organic matter on P adsorption and desorption.
Wang et al. (2022) [30]	Modified Langmuir	Introduced a modified Langmuir equation to account for organic material influence on P adsorption in Mollisols.
Zawadzka et al. (2024) [18]	Langmuir	Applied Langmuir to model phosphate sorption in engineered media.

References

Kirkby, E.A.; Johnston, A.E. Soil and Fertilizer Phosphorus in Relation to Crop Nutrition. In The Ecophysiology of Plant-Phosphorus Interactions; White, P.J., Hammond, J.P., Eds.; Springer: Dordrecht, The Netherlands, 2008; pp. 177–223. ISBN 978-1-4020-8435-5. [Google Scholar]
Marschner, P. Marschner’s Mineral Nutrition of Higher Plants; Academic Press: Cambridge, MA, USA, 2012. [Google Scholar]
Mihoub, A.; Daddi Bouhoun, M.; Saker, M. Phosphorus Adsorption Isotherm: A Key Aspect for Effective Use and Environmentally Friendly Management of Phosphorus Fertilizers in Calcareous Soils. Commun. Soil Sci. Plant Anal. 2016, 47, 1920–1929. [Google Scholar] [CrossRef]
Sgouras, I.D.; Tsadilas, C.D.; Barbayiannis, N.; Danalatos, N. Physicochemical and Mineralogical Properties of Red Mediterranean Soils from Greece. Commun. Soil Sci. Plant Anal. 2007, 38, 695–711. [Google Scholar] [CrossRef]
Liu, D.; Xiao, Z.; Zhang, Z.; Qiao, Y.; Chen, Y.; Wu, H.; Hu, C. The Crop Phosphorus Uptake, Use Efficiency, and Budget under Long-Term Manure and Fertilizer Application in a Rice–Wheat Planting System. Agriculture 2024, 14, 1393. [Google Scholar] [CrossRef]
Amarh, F.; Voegborlo, R.B.; Essuman, E.K.; Agorku, E.S.; Tettey, C.O.; Kortei, N.K. Effects of Soil Depth and Characteristics on Phosphorus Adsorption Isotherms of Different Land Utilization Types: Phosphorus Adsorption Isotherms of Soil. Soil Tillage Res. 2021, 213, 105139. [Google Scholar] [CrossRef]
Oehl, F.; Oberson, A.; Tagmann, H.U.; Besson, J.M.; Dubois, D.; Mäder, P.; Roth, H.-R.; Frossard, E. Phosphorus Budget and Phosphorus Availability in Soils under Organic and Conventional Farming. Nutr. Cycl. Agroecosyst 2002, 62, 25–35. [Google Scholar] [CrossRef]
Iatrou, M.; Papadopoulos, A.; Papadopoulos, F.; Dichala, O.; Psoma, P.; Bountla, A. Determination of Soil Available Phosphorus Using the Olsen and Mehlich 3 Methods for Greek Soils Having Variable Amounts of Calcium Carbonate. Commun. Soil Sci. Plant Anal. 2014, 45, 2207–2214. [Google Scholar] [CrossRef]
Heredia, O.S.; Fernández Cirelli, A. Environmental Risks of Increasing Phosphorus Addition in Relation to Soil Sorption Capacity. Geoderma 2007, 137, 426–431. [Google Scholar] [CrossRef]
Correll, D.L. The Role of Phosphorus in the Eutrophication of Receiving Waters: A Review. J. Environ. Qual. 1998, 27, 261–266. [Google Scholar] [CrossRef]
Holman, I.P.; Howden, N.J.K.; Bellamy, P.; Willby, N.; Whelan, M.J.; Rivas-Casado, M. An Assessment of the Risk to Surface Water Ecosystems of Groundwater P in the UK and Ireland. Sci. Total Environ. 2010, 408, 1847–1857. [Google Scholar] [CrossRef] [PubMed]
Olsen, S.R.; Watanabe, F.S. A Method to Determine a Phosphorus Adsorption Maximum of Soils as Measured by the Langmuir Isotherm. Soil Sci. Soc. Am. J. 1957, 21, 144–149. [Google Scholar] [CrossRef]
Helyar, K.R.; Munns, D.N.; Burau, R.G. Adsorption of Phosphate by Gibbsite. J. Soil Sci. 1976, 27, 307–314. [Google Scholar] [CrossRef]
Bolster, C.; Sistani, K. Sorption of Phosphorus from Swine, Dairy, and Poultry Manures. Commun. Soil Sci. Plant Anal. 2009, 40, 1106–1123. [Google Scholar] [CrossRef]
Gjettermann, B. Modelling P Dynamics in Soil-Decomposition and Sorption: Technical Report; Concepts and User Manual; DHI Water & Environment: Hørsholm, Denmark, 2004. [Google Scholar]
Barrow, N.J. The Description of Sorption Curves. Eur. J. Soil Sci. 2008, 59, 900–910. [Google Scholar] [CrossRef]
Barrow, N.J. On the Reversibility of Phosphate Sorption by Soils. J. Soil Sci. 1983, 34, 751–758. [Google Scholar] [CrossRef]
Zawadzka, B.; Siwiec, T.; Reczek, L.; Marzec, M.; Jóźwiakowski, K. Modeling of Phosphate Sorption Process on the Surface of Rockfos^® Material Using Langmuir Isotherms. Appl. Sci. 2024, 14, 10996. [Google Scholar] [CrossRef]
Dari, B.; Nair, V.D.; Colee, J.; Harris, W.G.; Mylavarapu, R. Estimation of Phosphorus Isotherm Parameters: A Simple and Cost-Effective Procedure. Front. Environ. Sci. 2015, 3, 70. [Google Scholar] [CrossRef]
Del Bubba, M.; Arias, C.A.; Brix, H. Phosphorus Adsorption Maximum of Sands for Use as Media in Subsurface Flow Constructed Reed Beds as Measured by the Langmuir Isotherm. Water Res. 2003, 37, 3390–3400. [Google Scholar] [CrossRef] [PubMed]
Dossa, E.; Baham, J.; Khouma, M.; Sene, M.; Kizito, F.; Dick, R. Phosphorus Sorption and Desorption in Semiarid Soils of Senegal Amended With Native Shrub Residues. Soil Sci. 2008, 173, 669–682. [Google Scholar] [CrossRef]
Jones, J.B. Laboratory Guide for Conducting Soil Tests and Plant Analysis; Taylor & Francis: Abingdon, UK, 2001; ISBN 9780849302060. [Google Scholar]
Magdoff, F.R.; Jokela, W.E.; Fox, R.H.; Griffin, G.F. A Soil Test for Nitrogen Availability in the Northeastern United States. Commun. Soil Sci. Plant Anal. 1990, 21, 1103–1115. [Google Scholar] [CrossRef]
Walkley, A.; Black, I.A. An Examination of the Degtjareff Method for Determining Soil Organic Matter, and a Proposed Modification of the Chromic Acid Titration Method. Soil Sci. 1934, 37, 29–38. [Google Scholar] [CrossRef]
Van Reeuwijk, L.P. Procedures for Soil Analysis; ISRIC: Wageningen, The Netherlands, 2002. [Google Scholar]
Bouyoucos, G.J. Hydrometer Method Improved for Making Particle Size Analyses of Soils1. Agron. J. 1962, 54, 464–465. [Google Scholar] [CrossRef]
Knudsen, D.; Peterson, G.A.; Pratt, P.F. Lithium, Sodium, and Potassium. In Methods of Soil Analysis; Agronomy Monographs; American Society of Agronomy, Inc.: Madison, WI, USA; Soil Science Society of America, Inc.: Madison, WI, USA, 1983; pp. 225–246. ISBN 9780891189770. [Google Scholar]
Iatrou, M.; Papadopoulos, A.; Papadopoulos, F.; Dichala, O.; Psoma, P.; Bountla, A. Determination of Soil-Available Micronutrients Using the DTPA and Mehlich 3 Methods for Greek Soils Having Variable Amounts of Calcium Carbonate. Commun. Soil Sci. Plant Anal. 2015, 46, 1905–1912. [Google Scholar] [CrossRef]
Jeffrey, A.J.; McCallum, L.E. Investigation of a Hot 0.01m CaCl2 Soil Boron Extraction Procedure Followed by ICP-AES Analysis. Commun. Soil Sci. Plant Anal. 1988, 19, 663–673. [Google Scholar] [CrossRef]
Wang, Z.; Hou, L.; Liu, Z.; Cao, N.; Wang, X. Using a Modified Langmuir Equation to Estimate the Influence of Organic Materials on Phosphorus Adsorption in a Mollisol From Northeast, China. Front. Environ. Sci. 2022, 10, 886900. [Google Scholar] [CrossRef]
Nair, P.S.; Logan, T.J.; Sharpley, A.N.; Sommers, L.; Tabatabai, M.; Yuan, T.L. Interlaboratory Comparison of a Standardized Phosphorus Adsorption Procedure. J. Environ. Qual. 1984, 13, 591–595. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene Selection for Cancer Classification Using Support Vector Machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
Xu, D.; Shi, Y.; Tsang, I.W.; Ong, Y.-S.; Gong, C.; Shen, X. Survey on Multi-Output Learning. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 2409–2429. [Google Scholar] [CrossRef]
Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-Generation Hyperparameter Optimization Framework. arXiv 2019, arXiv:1907.10902. [Google Scholar]
Shimizu, S.; Inazumi, T.; Kawahara, Y.; Washio, T.; Hoyer Patrikhoyer, P.O.; Bollen, K.; Sogawa, Y.; Hyvärinen, A.; Hoyer, P.O.; Bollen Shimizu, K. DirectLiNGAM: A Direct Method for Learning a Linear Non-Gaussian Structural Equation Model. J. Mach. Learn. Res. 2011, 12, 1225–1248. [Google Scholar]
Niyogi, D.; Kishtawal, C.; Tripathi, S.; Govindaraju, R. Observational Evidence That Agricultural Intensification and Land Use Change May Be Reducing the Indian Summer Monsoon Rainfall. Water Resour. Res. 2010, 46, 1–17. [Google Scholar] [CrossRef]
Hyvärinen, A.; Smith, S.M.; Spirtes, P. Pairwise Likelihood Ratios for Estimation of Non-Gaussian Structural Equation Models. J. Mach. Learn. Res. 2013, 14, 111–152. [Google Scholar] [PubMed]
Tukey, J.W. Comparing Individual Means in the Analysis of Variance. Biometrics 1949, 5, 99–114. [Google Scholar] [CrossRef] [PubMed]
Van Rossum, G.; Drake, F.L. Python Tutorial. History 2010, 42, 270–272. [Google Scholar] [CrossRef]
Hunter, J.D. Matplotlib: A 2D Graphics Environment. Comput. Sci. Eng. 2007, 9, 90–95. [Google Scholar] [CrossRef]
Yang, X.; Chen, X.; Yang, X. Effect of Organic Matter on Phosphorus Adsorption and Desorption in a Black Soil from Northeast China. Soil. Tillage Res. 2019, 187, 85–91. [Google Scholar] [CrossRef]
Iatrou, M.; Karydas, C.; Tseni, X.; Mourelatos, S. Representation Learning with a Variational Autoencoder for Predicting Nitrogen Requirement in Rice. Remote Sens. 2022, 14, 5978. [Google Scholar] [CrossRef]
Iatrou, M.; Karydas, C.; Iatrou, G.; Pitsiorlas, I.; Aschonitis, V.; Raptis, I.; Mpetas, S.; Kravvas, K.; Mourelatos, S. Topdressing Nitrogen Demand Prediction in Rice Crop Using Machine Learning Systems. Agriculture 2021, 11, 312. [Google Scholar] [CrossRef]
Hussain, A.; Ghafoor, A.; Anwar-ul-Haq, M.; Nawaz, M. Application of the Langmuir and Freundlich Equations for P Adsorption Phenomenon in Saline-Sodic Soils. Int. J. Agric. Biol. 2002, 5, 241–246. [Google Scholar] [CrossRef]
Lair, G.J.; Zehetner, F.; Khan, Z.H.; Gerzabek, M.H. Phosphorus Sorption–Desorption in Alluvial Soils of a Young Weathering Sequence at the Danube River. Geoderma 2009, 149, 39–44. [Google Scholar] [CrossRef]
Rossi, C.G.; Heil, D.M.; Bonumà, N.B.; Williams, J.R. Evaluation of the Langmuir Model in the Soil and Water Assessment Tool for a High Soil Phosphorus Condition. Environ. Model. Softw. 2012, 38, 40–49. [Google Scholar] [CrossRef]
Iatrou, M.; Tziachris, P.; Bilias, F.; Kekelis, P.; Pavlakis, C.; Theofilidou, A.; Papadopoulos, I.; Strouthopoulos, G.; Giannopoulos, G.; Arampatzis, D.; et al. Data-Driven and Mechanistic Soil Modeling for Precision Fertilization Management in Cotton. Nitrogen. 2025, 6, 29. [Google Scholar] [CrossRef]
Halliday, D.J.; Association, I.F.I. IFA World Fertilizer Use Manual; International Fertilizer Industry Association: Paris, France, 1992; ISBN 9782950629906. [Google Scholar]

Figure 1. The results of the feature evaluation using SHAP for the feature importance of the XGBoost model trained on the feature-augmented dataset.

Figure 2. SHAP dependence plots show how much the value of the most significant variables (Olsen P (a), Manganese (b), Sand content (c) and Magnesium (d)) changes the prediction of the P sorption capacity in the soil.

Figure 3. Directed acyclic graph indicating the causal relationship between the variables for the soil samples equilibrated with (a) 1, (b) 2, (c) 4, (d) 6 and (e) 10 mg P kg⁻¹. The direction of the arrow captures the direction of the causality, showing that Olsen P and sand content consistently act as causal factors contributing to the reduction of P adsorption.

Figure 4. Langmuir isotherms of P adsorbed onto soils equilibrated with different contents of P for (a) the different soil types and (b) based on the three main soil classes (sandy, loamy, clayey).

Figure 5. Relationship between actual and predicted P adsorption values using the multi-output XGBoost model on the test set. The model predicts P adsorption at five equilibrium concentrations (Ce = 1, 2, 4, 6, and 10 mg/L) for each of the 30 test samples, resulting in a total of 150 data points. The red dashed line represents the linear regression fit, and the R² value reflects the model’s overall predictive performance across all outputs.

Figure 6. P adsorption across Olsen P bins as estimated by the Langmuir isotherms and the multi-output XGBoost model.

Figure 7. P adsorption across sand bins as estimated by the Langmuir isotherms and the multi-output XGBoost model.

Table 1. Soil sample distribution by pH, CaCO₃ and soil texture.

pH →	4–6	6–7	>7	>7	>7	>7	>7
Clayey	5	5	5	8	10	2	5
Loamy	9	10	10	9	9	9	6
Sandy	9	12	12	5	6	0	1
CaCO₃ →	0%	0–1%	1–5%	5–10%	10–20%	20–30%	>30%

Table 2. Soil groups included in the study.

Group	pH Range	CaCO₃ Range	Description	Soil Texture
1	4.30–6.20	0	Acidic	Clayey, Loamy, Sandy
2	6.25–7.96	0–0.9%	Neutral, low carbonate	Clayey, Loamy, Sandy
3	6.83–8.18	1–10%	Alkaline, moderately calcareous	Clayey, Loamy, Sandy
4	7.20–8.28	10.3–48.3%	Strongly alkaline, calcareous	Clayey, Loamy, Sandy

Table 3. Parameters of the Langmuir isotherms for P adsorbed of the main soil classifications (sand, clay, and loam).

Soil Classification	Qm	K
Clayey	11,822.86	0.0080
Loamy	12,899.80	0.0072
Sandy	8231.61	0.0106 ¹

¹ Qm refers to the maximum adsorption capacity (mg/kg), and K, the Langmuir binding strength at the adsorption sites.

Table 4. Parameters of the Langmuir isotherms for P adsorbed in the various types of soils.

Soil Type	Qm	K
Clay (C)	16,366.06	0.0059
Clay loam (CL)	14,959.94	0.0061
Loamy (L)	12,055.41	0.0075
Silty loam (SiL)	104,509.79	0.0008
Loamy Sand (LS)	2803.55	0.0251
Sandy Clay Loam (SCL)	8517.25	0.0109
Sandy Loam (SL)	4998.74	0.0180
Silty Clay (SiC)	6686.86	0.0138
Silty Clay Loam (SiCL)	6840.72	0.0144 ¹

¹ Qm refers to the maximum adsorption capacity (mg/kg), and K, the Langmuir binding strength at the adsorption sites.

Table 5. Mean P adsorption percentage (±standard deviation) for each soil texture class for the original dataset including 147 soil samples.

	Equilibrium Concentrations (mg/L)
Soil Type	1	2	4	6	10
Clay (C)	98.2 ± 3.5 a	93.6 ± 11.1 a	93.3 ± 10.1 a	92.4 ± 9.5 a	91.2 ± 7.6 a
Clay loam (CL)	92.3 ± 8.2 b	91.8 ± 8.0 b	90.7 ± 8.2 b	90.6 ± 7.6 b	87.5 ± 9.7 b
Loamy (L)	81.8 ± 17.2 a	88.5 ± 9.3 c	86.6 ± 15.5 c	88.5 ± 7.9 c	83.6 ± 9.8 c
Loamy Sand (LS)	52.6 ± 9.9 b	68.3 ± 31.7 d	62.2 ± 23.2 a	63.4 ± 27.9 a	55.8 ± 27.9 a
Sandy Clay Loam (SCL)	86.7 ± 11.1 c	90.1 ± 6.9 e	89.4 ± 6.7 d	87.7 ± 8.7 d	83.8 ± 11.0 d
Sandy Loam (SL)	75.7 ± 23.3 d	83.2 ± 17.3 f	81.7 ± 17.2 e	83.4 ± 12.0 e	75.1 ± 12.9 b
Silty Clay (SiC)	98.5 ± 0.7 e	87.3 ± 14.0 g	86.4 ± 17.7 f	85.9 ± 17.9 f	81.0 ± 23.3 e
Silty Clay Loam (SiCL)	99.8 ± 0.3 f	91.8 ± 8.1 h	93.8 ± 4.2 g	91.5 ± 4.4 g	86.2 ± 6.3 f
Silty loam (SiL)	80.0 ± 24.9 g	83.1 ± 21.1 i	85.6 ± 12.5 h	90.4 ± 7.7 h	86.7 ± 7.4 g

Note: Different letters within columns are significantly different at p < 0.05 according to Tukey’s HSD test.

Table 6. Coefficients of the multiple linear regression models predicting P adsorption for each soil texture class (Sandy, Loamy, Clayey). Ce is the equilibrium concentrations of P: 1, 2, 4, 6, and 10 mg/L. Soil parameters are reported in mg kg^-1, except Sand and Clay (%), pH (unitless), EC (dS m^-1), and Organic Matter (%). R² values indicate the goodness of fit for each model.

Soil Class	Intercept	Sand	Clay	pH	EC	Organic Matter	P	Mg	Mn	Cu	Ce *	R²
Sandy	192.1	−2.4	−3.6	−5.9	16.3	−1.7	−4.2	15.7	1.4	3.6	78.8	0.98
Loamy	48.0	−1.2	1.9	−4.3	4.0	0.9	−1.9	−3.5	0.9	−0.3	86.9	0.94
Clayey	308.5	−1.0	1.5	−31.0	−25.1	−25.7	3.2	0.6	−3.9	−1.0	86.3	0.95

* Note: Ce is the equilibrium concentrations of P: 1, 2, 4, 6, and 10 mg/L.

Table 7. Mean P adsorption percentage (±standard deviation) for each soil texture class, as predicted from the multi-output XGBoost model for the extended dataset including 10,389 soil samples.

	Equilibrium Concentrations (mg/L)
Soil Type	1	2	4	6	10
Clay (C)	91.1 ± 12.9 a	89.7 ± 8.5 a	88.8 ± 8.2 a	88.5 ± 7.6 a	85.1 ± 7.4 a
Clay loam (CL)	83.6 ± 14.6 a	87.0 ± 8.9 a	86.8 ± 7.9 a	87.7 ± 6.5 b	84.7 ± 6.3 b
Loamy (L)	76.9 ± 14.4 a	84.3 ± 8.8 a	83.1 ± 8.6 a	85.6 ± 6.9 a	82.2 ± 7.0 a
Loamy Sand (LS)	76.3 ± 10.9 b	76.9 ± 9.5 a	65.8 ± 7.4 a	69.8 ± 7.6 a	61.6 ± 7.3 a
Sandy (S)	76.6 ± 11.1 c	76.9 ± 10.2 b	65.4 ± 5.1 b	69.8 ± 7.6 b	61.1 ± 6.6 b
Sandy Clay (SC)	81.5 ± 10.1 d	85.4 ± 6.4 b	84.3 ± 6.3 b	85.8 ± 3.3 c	82.3 ± 3.3 c
Sandy Clay Loam (SCL)	77.7 ± 12.7 e	84.0 ± 8.0 c	82.6 ± 8.2 c	83.6 ± 6.5 a	79.4 ± 6.8 a
Sandy Loam (SL)	74.4 ± 12.8 a	78.7 ± 9.8 c	73.8 ± 10.4 a	76.4 ± 9.8 a	70.2 ± 10.7 a
Silty Clay (SiC)	93.6 ± 10.0 b	89.7 ± 7.0 c	89.2 ± 7.7 c	88.6 ± 7.3 d	84.7 ± 7.2 d
Silty Clay Loam (SiCL)	93.1 ± 9.5 c	90.7 ± 5.7 d	90.4 ± 5.8 a	90.1 ± 5.4 a	86.4 ± 5.5 b
Silty loam (SiL)	84.6 ± 13.9 b	87.9 ± 7.2 d	87.2 ± 6.7 d	89.0 ± 5.0 b	85.4 ± 5.3 e

Note: Different letters within columns are significantly different at p < 0.05 according to Tukey’s HSD test.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Iatrou, M.; Papadopoulos, A. Machine Learning vs. Langmuir: A Multioutput XGBoost Regressor Better Captures Soil Phosphorus Adsorption Dynamics. Crops 2025, 5, 55. https://doi.org/10.3390/crops5040055

AMA Style

Iatrou M, Papadopoulos A. Machine Learning vs. Langmuir: A Multioutput XGBoost Regressor Better Captures Soil Phosphorus Adsorption Dynamics. Crops. 2025; 5(4):55. https://doi.org/10.3390/crops5040055

Chicago/Turabian Style

Iatrou, Miltiadis, and Aristotelis Papadopoulos. 2025. "Machine Learning vs. Langmuir: A Multioutput XGBoost Regressor Better Captures Soil Phosphorus Adsorption Dynamics" Crops 5, no. 4: 55. https://doi.org/10.3390/crops5040055

APA Style

Iatrou, M., & Papadopoulos, A. (2025). Machine Learning vs. Langmuir: A Multioutput XGBoost Regressor Better Captures Soil Phosphorus Adsorption Dynamics. Crops, 5(4), 55. https://doi.org/10.3390/crops5040055

Article Menu

Machine Learning vs. Langmuir: A Multioutput XGBoost Regressor Better Captures Soil Phosphorus Adsorption Dynamics

Abstract

1. Introduction

2. Materials and Methods

2.1. Laboratory Analysis

2.2. P Sorption Capacity

2.3. Machine Learning

2.4. Causal Discovery

2.5. Comparison of the Langmuir Isotherms and the Multi-Output XGBoost Regressor on a Large Soil Dataset

3. Results

3.1. Feature Engineering

3.2. Causal Inference

3.3. Langmuir Equations

3.4. Multiple Linear Regression Equations

3.5. Multi-Output XGBoost Model Performance

3.6. Performance of the Multi-Output XGBoost Model and Langmuir Isotherms on an Extended Soil Dataset

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI