Previous Article in Journal
Trade-Off Between Yield and Water-Use Efficiency in Piper nigrum
 
 
Article
Peer-Review Record

Machine Learning vs. Langmuir: A Multioutput XGBoost Regressor Better Captures Soil Phosphorus Adsorption Dynamics

by Miltiadis Iatrou * and Aristotelis Papadopoulos
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Submission received: 1 July 2025 / Revised: 23 July 2025 / Accepted: 11 August 2025 / Published: 13 August 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This manuscript compares the effectiveness of a multi-output XGBoost machine learning model with the traditional Langmuir isotherm for predicting soil phosphorus (P) adsorption dynamics. The machine learning approach shows clear advantages in predictive accuracy and adaptability but the study's reliance on a relatively modest initial dataset (147 samples) for model training limits generalizability. Moreover, although causal inference was applied, the absence of experimental validation weakens claims of causality. A broader geographic validation will strengthen the model's applicability.

Abstract:

The abstract is overly detailed and should be concise. The long sentences and repeated dataset size reduce readability. Grouping methods and results more distinctly will enhance clarity. Additionally, the phrase “obviously fail” doesn’t sound scientific that should be replaced with a more appropriate measured tone. Briefly stating the implications for agricultural or environmental applications will strengthen the conclusion. 

1. Introduction:

  • Line 36-38: Explain the sentence “general fertilizer recommendations may not provide adequate P” in a better way with a brief example or contrast between standard practices and site-specific needs.
  • Line 39-41: Refine the sentence “Even though soil may contain several hundred to thousand kilograms of P...” for clarity by specifying the range more precisely or cite data supporting this claim.
  • Line 41-43: Rephrase “rapidly binds to the soil and becomes fixed” to make it more technically accurate, such as "phosphorus rapidly reacts with soil constituents, forming less bioavailable complexes."
  • Line 50-52: The sentence “These blooms can harm both wildlife and humans...” is slightly overloaded with toxin types. It will read more smoothly if condensed or refocused on the most critical impacts. Rephrase this environmental concern to the study's motivation (i.e., developing better predictive models) so that the paragraph is more connected with the objective of this paper.

2.1. Laboratory analysis:

  • Line 65-82: The paragraph is comprehensive and methodologically sound, covering sample preparation and the range of measured soil properties. However, one suggestion is to clarify the selection criteria for the 147 soil samples. The paragraph states different physicochemical properties but it will help to mention whether the selection aimed to ensure representativeness across soil types or focused on specific regional conditions. Also, there is redundancy with soil properties like pH and EC being mentioned both in the feature list and again in the sentence about their measurement methods. This should be streamlined for clarity.
  • Line 83-84: Add a rationale to explain why these specific groupings (based on pH, CaCO₃, and texture) were chosen. This will connect the classifications more clearly to the goals on phosphorus adsorption variability.

2.2. P sorption capacity

  • Line 102: The mention of “desorption” in the opening line is a bit misleading, as the paragraph focuses entirely on the adsorption procedure. If desorption was part of a different section, it may be better to omit or relocate.
  • Line 104-107: Add justification to the phrase “These P concentrations were selected because they are more likely to be encountered in natural agronomic conditions…” by citing typical P concentrations found in agricultural runoff or amended soils.

2.3. Machine learning:

  • Line 115-118: Explain why a multi-output model was preferred over fitting separate single-output models for each concentration level. This will help readers less familiar with multi-target learning understand the methodological advantage.
  • Line 121-123: The mention of “735 rows” is confusing unless it’s made clear that this expanded dataset was only for feature selection and not for final model training (which is explained later).
  • Line 143-145: Briefly explain why Bayesian optimization (via Optuna) was preferred over grid or random search. Adding a one-line summary, such as “Optuna was chosen for its efficiency in exploring large hyperparameter spaces,” will make the motivation clearer.
  • Line 145-148: Add a rationale behind selecting the final parameter values (e.g., how they reflect model complexity vs. performance trade-offs) will enhance transparency.

2.4. Causal discovery:

  • Line 152-153: Add a justification for selecting DirectLiNGAM over other causal inference methods (such as Peter-Clark or Greedy Equivalence Search algorithms).
  • Line 155-156: The sentence “DirectLiNGAM assumes that the underlying causal structure is linear…” is helpful, but this assumption can be a limitation if soil-chemical interactions are inherently nonlinear.
  • Line 160-162: The importance of non-Gaussianity is mentioned but its relevance to the dataset is not explicitly addressed. Add a sentence explaining whether the data satisfied (or approximately satisfied) this assumption will improve transparency.

2.5. Comparison of the Langmuir isotherms and the Multi-output XGBoost regressor on a large 163 soil dataset:

  • Line 165-166: What is the rationale behind choosing the extended dataset of 10,389 samples? It’s unclear whether this dataset was independently collected, synthetically generated, or derived from scaling the original.
  • Line 168: The term “to assessing their relative effectiveness” is a bit awkward. Consider rephrasing it as “to assess and compare their predictive accuracy.”
  • Line 169-170: Explain why Tukey’s HSD test was chosen over other post-hoc methods (e.g., Bonferroni), particularly since the study deals with multiple soil types and equilibrium concentrations. A sentence highlighting its appropriateness for comparing group means following ANOVA will offer useful context.
  • Line 172-174: The link between this statistical test and the model performance comparison can be made more explicit. Was it used to test model outputs, raw data, or both?

3.1. Feature engineering:

  • Line 182-183: Explain whether the selection of features through RFE was cross-validated or whether any feature interactions were considered during importance analysis.
  • Line 188-190: The phrase “which was something expected” regarding Olsen P’s impact sounds vague and informal. Consider rewording to something like “which is consistent with established knowledge on phosphorus availability.”
  • Line 190-191: The use of “surprising” to describe the sharp decline in P sorption can be clarified. Was it surprising compared to past studies, or relative to other features in the model?
  • Line 193-194: The connection between manganese levels and P adsorption is stated but not contextualized. Add a sentence that ties this to soil redox conditions or mineralogy. Also, for magnesium, consider noting whether the observed positive relationship is supported by past agronomic literature or if it’s a novel finding worth further investigation.

3.2. Causal inference:

  • Line 198-199: The phrase “All DAGs in Figure 3 consistently show...” is helpful, but it might be more informative to briefly describe how consistency across DAGs was determined. Did all Ce levels yield the same parent-child structure?
  • Line 200-202: Add a sentence acknowledging the limitations of causal inference from observational data (e.g., potential unmeasured confounders or assumptions of linearity in DirectLiNGAM) to have a balanced scientific tone and to confirm the alignment between SHAP and causal inference.

3.3. Langmuir equations:

  • Line 214-216: Expand the explanation of dataset limitations, specifically, the lack of representation of certain textures in the original 147 samples. Was this underrepresentation affected the reliability of the fitted Langmuir equations for those groups in the extended dataset?
  • Line 216-219: The phrase “broader equations were generated” can be more specific. Was this achieved through grouping, imputation, or model extrapolation?
  • Line 221-223: The conclusion that sandy soils exhibited the lowest P adsorption is consistent with earlier sections. Add a sentence acknowledging whether this matches expectations from prior studies to have a more grounded scientific tone.

3.4. Multiple linear regression equations:

  • Line 233-235: Add explanation to clarify whether these regressions were derived from the same dataset as the Langmuir models or from a different subset. This distinction matters for interpreting their applicability.
  • Line 236-238: It’s mentioned that the interaction between soil classification and predictors was statistically significant (p < 0.001) but the model fit (e.g., R² values) for each regression is not reported. A better way to use a table instead of equations with repeated features in each equation. The row header can be soil types (Sandy, Loamy, Clayey) and the column header can be features (Intercept, Sand, Clay, pH, etc.). The R2 column can be appended to same table.
  • Equation of Sandy soils: The large positive effect of EC in sandy soils appears counterintuitive. Add a short explanation or a supporting reference that help validate their plausibility. Higher EC might indicate greater cation presence, potentially enhancing or interfering with phosphorus retention, though this effect should ideally be interpreted alongside other variables.

3.5. Multi-output XGBoost model performance:

  • The statement that the model captured "complex, nonlinear relationships" is reasonable, but need an explanation by a brief reference back to SHAP results or feature interactions that demonstrate such nonlinearity.
  • The figure referenced (Figure 5) is informative, yet it may help to mention here that model performance was slightly weaker at higher equilibrium concentrations, as discussed later in the paper. Including that point will give a more balanced view of the model’s strengths and limitations.
  • The reported metrics, MAE of 26.5 mg/kg and R² of 0.5, are helpful, but it will be strengthened if the interpretation to briefly state how these values compare to benchmarks or traditional models like Langmuir. Is an R² of 0.5 considered strong given the complexity of soil data, or does it indicate room for improvement?

3.6. Performance of the multi-output XGBoost model and Langmuir isotherms on an extended 287 soil dataset:

  • Line 289-290: Clarify whether the extended dataset includes independent observations or extrapolated from the original data. This is important to understand the generalizability of the comparison. If these data were modeled or resampled rather than directly measured, this information must be included.
  • Line 296-307: The binning process using quantile-based ranges is described clearly but a short justification for choosing four bins for both Olsen P and sand content can contextualize the choice. Were these bins based on agronomic thresholds or purely statistical distribution?
  • Line 312-314: Explain why the Langmuir model underestimates these effects, likely due to its linear or simplified functional form. A one-sentence insight here enhances the interpretation without overexplaining.
  • Line 317-322: The reiteration of model’s superior performance feels redundant. Consider merging the interpretive insights of this and the previous paragraph into a single cohesive reflection on the XGBoost model’s advantage in handling nonlinear effects.

4. Discussion:

  • Line 340-341: The repeated emphasis on the XGBoost model’s responsiveness will be stronger if balanced with a brief inclusion of its potential limitations (e.g., dependence on a well-structured dataset or interpretability challenges in operational settings).
  • Line 341-343: The phrase “not surprising” should be reworded to sound more objective like “This is consistent with XGBoost’s well-established performance in high-dimensional regression tasks.”
  • Line 357-363: The claim that manganese and magnesium effects are likely indirect can be supported from a citation or brief mechanistic explanation (e.g., soil redox behavior, cation exchange processes). The interpretation is logical, but providing a reference or observed trend can add depth and support. Also, consider whether the indirect effects observed in SHAP but not in causal inference can hint at latent variable interactions, this might be worth mentioning cautiously.
  • Line 368-370: The phrase “strongly confirm” should be toned down to “further support,” as causal inference on observational data should be interpreted with some caution.
  • Line 376-378: Clarify what “adequately” means when stating that the Langmuir equation "adequately described" soil adsorption, perhaps by referencing the model fit or residuals. Also, briefly mention that the Langmuir model failed to capture the magnitude of variation seen in high-sand or high-Olsen-P groups (as shown earlier).
  • Line 381-383: It’s better to mention that the model performed less consistently at higher equilibrium concentrations (e.g., Ce = 10 mg/L), as noted earlier. Also, stating whether the level of error (MAE = 26.5 mg/kg) is acceptable for agronomic decision-making in a real-world context.
  • Line 391-392: The phrase “It is not surprising...” should be replaced with a better and formal phrasing (e.g., “This result is expected given the flexibility of tree-based models”).

5. Conclusions:

  • Line 415-417: the phrase “significantly outperforms” should be softened slightly to reflect the statistical results more cautiously (e.g., “demonstrated improved performance in capturing phosphorus adsorption variability)”. This avoids sounding overstated, especially given that R² = 0.50, suggesting moderate predictive power.
  • Line 417-419: Add how the sensitivity (the model's responsiveness to Olsen P and sand content) supports site-specific fertilizer recommendations or reduced environmental phosphorus losses. A single sentence pointing to potential practical applications will ground the conclusion in agronomic relevance.
  • Line 419-422: Briefly mention whether further validation (e.g., across other soil regions or under field conditions) is needed before operational deployment. This will show critical awareness of the model’s current scope and future potential.
Comments on the Quality of English Language

Occasional informal phrases such as “not surprising” or “obviously fail” should be revised to maintain a scientific tone. 

Author Response

Comment 1: This manuscript compares the effectiveness of a multi-output XGBoost machine learning model with the traditional Langmuir isotherm for predicting soil phosphorus (P) adsorption dynamics. The machine learning approach shows clear advantages in predictive accuracy and adaptability but the study's reliance on a relatively modest initial dataset (147 samples) for model training limits generalizability. Moreover, although causal inference was applied, the absence of experimental validation weakens claims of causality. A broader geographic validation will strengthen the model's applicability.

Response 1: We thank the reviewer for the positive assessment regarding the predictive advantages of the machine learning approach over the traditional Langmuir isotherm. We also appreciate the comments related to dataset size and causal inference. While the initial dataset comprised 147 soils, each sample was subjected to five different equilibrium phosphorus concentrations (Ce = 1, 2, 4, 6, and 10 mg/L), resulting in multiple observations per soil and enabling evaluation of P adsorption behavior across a range of conditions.

Crucially, the variables identified as causal (initial Olsen P and sand content) were held constant across these treatments, while the response variable (P adsorption) varied. This repeated-measures design provides an internal framework for validating the causal relationships derived through DirectLiNGAM, as each Ce level effectively serves as a validation point for the others.

Furthermore, the convergence of results from SHAP feature importance, causal inference, and experimental trends across Ce levels (Figures 2 and 3) adds robustness to our findings. Nonetheless, we fully agree that broader geographic validation would enhance the model's generalizability and have noted this as an important direction for future work.

Comment 2: The abstract is overly detailed and should be concise. The long sentences and repeated dataset size reduce readability. Grouping methods and results more distinctly will enhance clarity. Additionally, the phrase “obviously fail” doesn’t sound scientific that should be replaced with a more appropriate measured tone. Briefly stating the implications for agricultural or environmental applications will strengthen the conclusion. 

Response 2: Agree. We replaced the phrase “obviously fail” with the phrase " do not adequately capture”

Comment 3: Line 36-38: Explain the sentence “general fertilizer recommendations may not provide adequate P” in a better way with a brief example or contrast between standard practices and site-specific needs.

Response 3: Agree. We added this text: General fertilizer recommendations are typically based on standard P application rates for each crop, or in more advanced cases, adjusted using Olsen P levels, but they often do not account for the soil’s P sorption capacity. As a result, such generalized guidelines may underperform in soils with high fixation potential, where a significant portion of the applied P becomes unavailable to plants.

Comment 4: Line 39-41: Refine the sentence “Even though soil may contain several hundred to thousand kilograms of P...” for clarity by specifying the range more precisely or cite data supporting this claim.

Response 4: Agree. Thank you very much for your comment. We added the following text:
Even though soil may contain thousands of kilograms of P per hectare, much of this may not be available to crops [5]. Oehl et al. [6] reported that total P in the topsoil (0–20 cm) can range from approximately 1,400 to over 2,000 kg P ha⁻¹. However, only a small fraction of this total P is plant-available, as much of it is strongly bound within the soil matrix due to sorption and fixation processes.

Comment 5: Line 41-43: Rephrase “rapidly binds to the soil and becomes fixed” to make it more technically accurate, such as "phosphorus rapidly reacts with soil constituents, forming less bioavailable complexes."

Response 5: Agree. Thank you very much for this suggestion. We replaced the phrase rapidly binds to the soil and becomes fixed with the "phosphorus rapidly reacts with soil constituents, forming less bioavailable complexes".

Comment 6: Line 50-52: The sentence “These blooms can harm both wildlife and humans...” is slightly overloaded with toxin types. It will read more smoothly if condensed or refocused on the most critical impacts. Rephrase this environmental concern to the study's motivation (i.e., developing better predictive models) so that the paragraph is more connected with the objective of this paper.

Response 6: Agree. Thank you very much for this comment: We replaced the phrase: “These blooms can harm both wildlife and humans, as certain algal species release dermatoxins, neurotoxins, cytotoxins, and hepatotoxins”, with These blooms can disrupt aquatic ecosystems and pose risks to human and animal health, as some algal species produce harmful toxins. Additionally, eutrophication can produce unpleasant odors due to the decomposition of algae and degrade water quality, impairing its use not only for irrigation and drinking but also for recreational purposes. Such environmental risks underscore the need for reducing P losses from agricultural fields using predictive models that can accurately estimate soil P dynamics, enabling more targeted fertilizer use and reducing excess P applications.

Comment 7: Line 65-82: The paragraph is comprehensive and methodologically sound, covering sample preparation and the range of measured soil properties. However, one suggestion is to clarify the selection criteria for the 147 soil samples. The paragraph states different physicochemical properties but it will help to mention whether the selection aimed to ensure representativeness across soil types or focused on specific regional conditions. Also, there is redundancy with soil properties like pH and EC being mentioned both in the feature list and again in the sentence about their measurement methods. This should be streamlined for clarity.

Response 7: Agree. Thank you very much for your comment. We have clarified the selection criteria for the 147 soil samples by stating that they were chosen to represent a wide range of soil textures, pH levels, and CaCO₃ contents commonly found in Greek agricultural soils. This was done to ensure that the dataset captured diverse conditions relevant to phosphorus adsorption. We added the phrase: “to ensure a wide representation of physicochemical variability, particularly across different soil textures, pH levels, and calcium carbonate (CaCO₃) contents typical of Greek agricultural soils”.

Comment 8: Line 83-84: Add a rationale to explain why these specific groupings (based on pH, CaCO₃, and texture) were chosen. This will connect the classifications more clearly to the goals on phosphorus adsorption variability.

Response 8: Agree. Thank you for your suggestion. The sentence has been revised as follows: The soils were classified into four groups based on combinations of pH, CaCO₃ content, and texture (Table 1 – 2), as these factors are among the most influential in determining P sorption behavior, as reported in previous studies [3,11,12].

Comment 9: Line 102: The mention of “desorption” in the opening line is a bit misleading, as the paragraph focuses entirely on the adsorption procedure. If desorption was part of a different section, it may be better to omit or relocate.

Response 9: Agree. Thank you for pointing this out. The paragraph focuses exclusively on the phosphorus adsorption procedure. The mention of “desorption” in the opening line was unintentional and has been removed to avoid confusion. The revised sentence is as follows “P adsorption experiments were performed for these soil samples…”

Comment 10: Line 104-107: Add justification to the phrase “These P concentrations were selected because they are more likely to be encountered in natural agronomic conditions…” by citing typical P concentrations found in agricultural runoff or amended soils.

Response 10: Agree. Thank you for pointing this out. The paragraph has been amended as follows: These P concentrations were selected because they are more likely to be encountered in natural agronomic conditions, corresponding approximately to field applications of 50, 100, 200, 300, and 500 kg P ha⁻¹. Although higher P concentrations have often been used in previous adsorption studies [26] the selected range better reflects realistic P application rates used in commercial agriculture.

Comment 11: Line 115-118: Explain why a multi-output model was preferred over fitting separate single-output models for each concentration level. This will help readers less familiar with multi-target learning understand the methodological advantage.

Response 11: Agree. Thank you for your comment. We added the following text: “A multi-output regression model was preferred over fitting separate single-output models for each equilibrium concentration, because it allows the model to simultaneously learn from the shared structure and correlations across all output targets. In the context of P adsorption, the responses at different equilibrium concentrations are not independent, but they are part of P sorption dynamics governed by the same underlying soil properties. By training a unified multi-target model, the learning algorithm can exploit these dependencies to improve overall predictive performance and generalization. In contrast, training separate models for each concentration level would ignore these interrelationships, potentially leading to inconsistent predictions across concentrations. Multi-output learning thus offers statistical advantages, especially when the outputs are naturally linked, as is the case with sorption dynamics across varying P concentrations.”

 

Comment 12: Line 121-123: The mention of “735 rows” is confusing unless it’s made clear that this expanded dataset was only for feature selection and not for final model training (which is explained later).

Response 12: Agree. Thank you for pointing this out. We have added the following phrase in the first paragraph: “Notably, this augmented dataset was used solely for feature selection and interpretation purposes and not for training the predictive model.”

Comment 13: Line 143-145: Briefly explain why Bayesian optimization (via Optuna) was preferred over grid or random search. Adding a one-line summary, such as “Optuna was chosen for its efficiency in exploring large hyperparameter spaces,” will make the motivation clearer.

Response 13: Agree. Thank you for this suggestion. We have added: “Optuna was chosen for its efficiency in exploring large hyperparameter spaces compared to traditional methods like grid or random search.”

Comment 14: Line 152-153: Add a justification for selecting DirectLiNGAM over other causal inference methods (such as Peter-Clark or Greedy Equivalence Search algorithms).

Response 14: Agree. Thank you for this comment. We have added the following text: “These values reflect a balanced trade-off between model complexity and generalization performance, favoring moderately deep trees and regularization to prevent overfitting while retaining the model’s ability to capture nonlinear interactions in the data.”

Comment 15: Line 155-156: The sentence “DirectLiNGAM assumes that the underlying causal structure is linear…” is helpful, but this assumption can be a limitation if soil-chemical interactions are inherently nonlinear.

Response 15: Agree. Thank you for letting us clarify this. We have added the following text: While the assumption that the underlying causal structure is linear may limit the model’s ability to capture nonlinear soil and chemical interactions, DirectLiNGAM remains a robust choice for identifying the dominant causal directions in moderate-sized datasets. Additionally, DirectLiNGAM has the advantage of assuming non-Gaussianity in the data, which is critical for enabling the algorithm to detect causal relationships beyond second-order statistical dependencies such as covariance. This allows the model to uncover more complex causal patterns compared to methods based solely on linear Gaussian assumptions. Nonlinear causal discovery algorithms could, in theory, capture richer interactions, but they typically require substantially larger datasets to produce stable and reliable results, which was not feasible in the present study.

Comment 16: Line 160-162: The importance of non-Gaussianity is mentioned but its relevance to the dataset is not explicitly addressed. Add a sentence explaining whether the data satisfied (or approximately satisfied) this assumption will improve transparency.

Response 16: Thank you for your comment. We agree with your suggestion and have revised the entire paragraph (as addressed in Comment 15) to incorporate both the rationale for using DirectLiNGAM and the relevance of the non-Gaussianity assumption. We now explicitly explain that the dataset approximately satisfies the non-Gaussianity requirement, which supports the suitability of DirectLiNGAM for this analysis.

Comment 17: Line 165-166: What is the rationale behind choosing the extended dataset of 10,389 samples? It’s unclear whether this dataset was independently collected, synthetically generated, or derived from scaling the original.

Response 17: Agree. We have rephrased the paragraph to clarify that the data were independently collected. The revised text now reads: “To evaluate the generalizability of both the Langmuir isotherms and the mul-ti-output XGBoost model, we applied them to an extended dataset comprising 10,389 soil samples independently collected over recent years by the Soil and Water Resources Institute of Thessaloniki. This dataset includes a broad range of soil physicochemical properties and was used to estimate P adsorption based on the models trained on the original experimental dataset.”

Comment 18: Line 168: The term “to assessing their relative effectiveness” is a bit awkward. Consider rephrasing it as “to assess and compare their predictive accuracy.”

Response 18: Agree. We have added the following text: “To assess and compare the generalizability and the predictive accuracy”

Comment 19: Line 169-170: Explain why Tukey’s HSD test was chosen over other post-hoc methods (e.g., Bonferroni), particularly since the study deals with multiple soil types and equilibrium concentrations. A sentence highlighting its appropriateness for comparing group means following ANOVA will offer useful context.

Response 19: Tukey’s HSD was chosen over more conservative methods like Bonferroni because it is specifically designed for all pairwise comparisons following ANOVA, making it more appropriate for detecting differences among multiple soil types across equilibrium concentrations. We include the explanation in the manuscript as follows: “Tukey’s HSD test was selected as the post-hoc method because it is well suited for identifying statistically significant differences between all possible pairs of group means following ANOVA, particularly when comparing multiple soil types across several equilibrium concentrations.”

Comment 20: Line 172-174: The link between this statistical test and the model performance comparison can be made more explicit. Was it used to test model outputs, raw data, or both?

Response 20: The Tukey test was used to test predictions. We have added the following text to point this out: “It was used here to assess whether the multi-output XGBoost model and Langmuir isotherms produced statistically distinguishable group-wise predictions between soil textures at each equilibrium concentration (Ce), with a significance threshold of p < 0.05.”

Comment 21: Line 182-183: Explain whether the selection of features through RFE was cross-validated or whether any feature interactions were considered during importance analysis.

Response 21: Thank you for the comment. The Recursive Feature Elimination (RFE) procedure was applied using a fixed training-validation split rather than cross-validation. We acknowledge that incorporating cross-validation (e.g., via RFECV) could provide a more robust feature selection process. However, given the moderate dataset size and the primary goal of identifying dominant predictors rather than optimizing for marginal gains in performance, we opted for a fixed approach. We have clarified this point in the revised manuscript with the following text: “This procedure (RFE) was conducted on the training test using a fixed train-test split rather than cross-validation given the moderate dataset size.”

Comment 22: Line 188-190: The phrase “which was something expected” regarding Olsen P’s impact sounds vague and informal. Consider rewording to something like “which is consistent with established knowledge on phosphorus availability.”

Response 22: Thank you for your comment. We agree with your suggestion and have revised the sentence accordingly. It now reads: “which is consistent with established knowledge on phosphorus availability.”

Comment 23: Line 190-191: The use of “surprising” to describe the sharp decline in P sorption can be clarified. Was it surprising compared to past studies, or relative to other features in the model?

Response 23: Agree. Thank you for the comment. We have clarified that the observed decline in P adsorption with increasing Olsen P was not "surprising" in a general agronomic sense, but rather notable in its magnitude and consistency. This is particularly significant because, to our knowledge, no previous studies have examined this relationship using nonlinear models such as XGBoost while also accounting for interactions with other soil variables. We have revised the manuscript accordingly to reflect this point and we have added the following text: “The SHAP dependence plot revealed a clear decline in P adsorption with increasing initial Olsen P concentrations, which aligns with agronomic expectations regarding phosphorus availability. However, the strength and consistency of this relationship across equilibrium concentrations was notable, especially since, to our knowledge, no previous studies have explicitly quantified this decline using nonlinear machine learning models while simultaneously accounting for other soil variables (Figure 2a).”

Comment 24: Line 193-194: The connection between manganese levels and P adsorption is stated but not contextualized. Add a sentence that ties this to soil redox conditions or mineralogy. Also, for magnesium, consider noting whether the observed positive relationship is supported by past agronomic literature or if it’s a novel finding worth further investigation.

Response 24: Agree. The paragraph has been rephrased as follows: “The positive correlation between manganese availability and P sorption capacity can likely be attributed to both indirect and direct mechanisms: manganese availability increases under acidic or reducing conditions, which are also associated with higher P sorption capacity, and Mn oxides themselves are known to directly contribute to P retention in soil through surface adsorption processes [11]. A similar positive trend was observed for magnesium, which may reflect its higher availability in alkaline soils with high calcium carbonate content, which tend to exhibit elevated P sorption (Figure 2d).”

Comment 25: Line 198-199: The phrase “All DAGs in Figure 3 consistently show...” is helpful, but it might be more informative to briefly describe how consistency across DAGs was determined. Did all Ce levels yield the same parent-child structure?

Response 25: Thank you for your comment. We agree with your suggestion and have added the following text to clarify that all Ce levels yielded consistent parent-child structures: “All DAGs in Figure 3 consistently show sand content as a parent node directly influencing P adsorption. This consistency was confirmed by examining the causal graphs generated separately for each equilibrium concentration (Ce = 1, 2, 4, 6, and 10 mg/L), where both the direction and presence of edges from sand content to P adsorption remained stable across all models. A causal link between Olsen P and P adsorption was also observed for Ce levels of 1, 2, 4, and 6 mg/L. The direction of this relationship was consistent across these concentrations, further supporting the robustness of the inferred causal structure.

Comment 26: Line 200-202: Add a sentence acknowledging the limitations of causal inference from observational data (e.g., potential unmeasured confounders or assumptions of linearity in DirectLiNGAM) to have a balanced scientific tone and to confirm the alignment between SHAP and causal inference.

Response 26: Agree and thank you for this valuable comment. We have added a sentence to acknowledge the inherent limitations of causal inference using observational data, particularly the assumptions of linearity and the absence of unmeasured confounders required by the DirectLiNGAM method. We also highlighted that the convergence of SHAP-based feature importance and causal inference supports the robustness of the identified relationships. This is the revised sentence for the manuscript: “While the causal inference results align well with the SHAP-based feature importance analysis, it is important to note that Olsen P and sand content were identified as the primary causal drivers of P adsorption and ranked among the most influential features in the SHAP-based interpretation. Nonetheless, we acknowledge the limitations of causal discovery from observational data, particularly the assumptions of linearity and the absence of unmeasured confounders required by the DirectLiNGAM algorithm.

Comment 27: Line 214-216: Expand the explanation of dataset limitations, specifically, the lack of representation of certain textures in the original 147 samples. Was this underrepresentation affected the reliability of the fitted Langmuir equations for those groups in the extended dataset?

Response 27: Agree. Thank you for your comment. We have expanded the manuscript to clarify that as follows: “While the fitted Langmuir equations provide reasonable estimates across major soil classes, their applicability to underrepresented texture groups (e.g., sandy, sandy clay, and silty texture groups) may be somewhat limited due to the smaller number of experimental observations. As such, results for these groups should be interpreted with appropriate consideration of this constraint.”

Comment 28: Line 216-219: The phrase “broader equations were generated” can be more specific. Was this achieved through grouping, imputation, or model extrapolation?

Response 28: Agree and thank you for your comment. The following text was added: “However, as the extended dataset included some samples from these underrepresented groups, broader Langmuir equations were generated by aggregating samples into three main soil classification groups (sand, clay, and loam), allowing us to estimate P adsorption for all samples through grouped fitting rather than extrapolation from individual texture classes.”

Comment 29: Line 221-223: The conclusion that sandy soils exhibited the lowest P adsorption is consistent with earlier sections. Add a sentence acknowledging whether this matches expectations from prior studies to have a more grounded scientific tone.

Response 29: Agree. Thank you very much for your comment. The following text was added: “Previous studies also have shown that sandy soils typically exhibit lower P adsorption due to their low clay content, iron and aluminum oxides, and limited surface area for sorption reactions [11,40].”

Comment 30: Line 233-235: Add explanation to clarify whether these regressions were derived from the same dataset as the Langmuir models or from a different subset. This distinction matters for interpreting their applicability.

Response 30: Agree. Thank you for your comment. We have clarified in the manuscript that the multiple linear regression equations were developed from the same dataset of 147 soil samples used for the Langmuir isotherms. The following text was added in the manuscript: “These multiple linear regression equations were derived using the same original dataset of 147 soil samples that was used to fit the Langmuir models. The intention was to provide simplified, empirical alternatives for estimating P adsorption across different soil classifications, based on readily available soil test data.”

Comment 31: Line 236-238: It’s mentioned that the interaction between soil classification and predictors was statistically significant (p < 0.001) but the model fit (e.g., R² values) for each regression is not reported. A better way to use a table instead of equations with repeated features in each equation. The row header can be soil types (Sandy, Loamy, Clayey) and the column header can be features (Intercept, Sand, Clay, pH, etc.). The R2 column can be appended to same table.

Response 30: Agree. Thank you for your comment. We added the following table:

Table 6. Mean P adsorption percentage (± standard deviation) for each soil texture class for the original dataset including 147 soil samples.

Soil Class

Intercept

Sand

Clay

pH

EC

Organic matter

P

Mg

Mn

Cu

Ce*

R2

Sandy

192.1

-2.4

-3.6

-5.9

16.3

-1.7

-4.2

15.7

1.4

3.6

78.8

0.98

Loamy

48.0

-1.2

1.9

-4.3

4.0

0.9

-1.9

-3.5

0.9

-0.3

86.9

0.94

Clayey

308.5

-1.0

1.5

-31.0

-25.1

-25.7

3.2

0.6

-3.9

-1.0

86.3

0.95

Note: Ce is the equilibrium concentrations of P: 1, 2, 4, 6, and 10 mg/L.

 

Comment 31: Equation of Sandy soils: The large positive effect of EC in sandy soils appears counterintuitive. Add a short explanation or a supporting reference that help validate their plausibility. Higher EC might indicate greater cation presence, potentially enhancing or interfering with phosphorus retention, though this effect should ideally be interpreted alongside other variables.

Response 31: Agree. The following text was added in the manuscript: “Although the regression model for sandy soils shows a relatively large positive coefficient for EC, this effect should be interpreted in the context of the overall model structure. Due to the dominant effect of sand content, which is typically high in these soils, and its negative coefficient (−2.4), overall P adsorption remains substantially lower in sandy soils compared to clayey and loamy soils. However, when P retention does occur in sandy soils, it appears to be more strongly influenced by EC, suggesting that soluble salt levels may play a role in modulating P adsorption under otherwise low-retention conditions.

Comment 32: The statement that the model captured "complex, nonlinear relationships" is reasonable, but need an explanation by a brief reference back to SHAP results or feature interactions that demonstrate such nonlinearity.

Response 32: Agree. Thank you for your comment. The following text was added in the manuscript: “This conclusion is supported by the SHAP analysis, which revealed nonlinear trends in key features such as Olsen P, sand content, and manganese (Figure 2), where the effect on P adsorption varied in intensity and direction depending on feature values.”

Comment 33: The figure referenced (Figure 5) is informative, yet it may help to mention here that model performance was slightly weaker at higher equilibrium concentrations, as discussed later in the paper. Including that point will give a more balanced view of the model’s strengths and limitations.

Response 33: Agree. Thank you for your comment. The following text was added in the manuscript: “While the model showed strong overall agreement between predicted and observed values, its performance was slightly weaker at higher equilibrium concentrations (e.g., Ce = 10 mg/L), as reflected by increased scatter in predictions.”

Comment 34: The reported metrics, MAE of 26.5 mg/kg and R² of 0.5, are helpful, but it will be strengthened if the interpretation to briefly state how these values compare to benchmarks or traditional models like Langmuir. Is an R² of 0.5 considered strong given the complexity of soil data, or does it indicate room for improvement?

Response 34: Agree: Thank you for your comment. We added the following text: “An R² of 0.50, while moderate, is considered reasonable given the inherent variability and multivariate nature of soil properties. To the best of our knowledge, there are no previous studies that have validated Langmuir equations on unseen data using an independent test set. Traditional applications of the Langmuir model typically involve fitting to the entire dataset without assessing predictive performance, which limits direct comparison with modern data-driven approaches.”

Comment 35: Line 289-290: Clarify whether the extended dataset includes independent observations or extrapolated from the original data. This is important to understand the generalizability of the comparison. If these data were modeled or resampled rather than directly measured, this information must be included.

Response 35: Agree. Thank you very much for your comment. We added the following text in the manuscript: “As noted earlier, the extended dataset comprises 10,389 soil samples independently collected over recent years by the Soil and Water Resources Institute of Thessaloniki. These samples were not modeled or resampled from the original dataset, allowing for a more robust assessment of model generalizability across a broader range of soil conditions.”

Comment 36: Line 296-307: The binning process using quantile-based ranges is described clearly but a short justification for choosing four bins for both Olsen P and sand content can contextualize the choice. Were these bins based on agronomic thresholds or purely statistical distribution?

Response 36: Agree. Thank you for your thoughtful comment. The following text was added in the manuscript: “The extended dataset was binned into four categories based on Olsen P concentrations and four categories based on sand content. The beginning was based on quantile-based statistical distribution to ensure approximately equal representation across the extended dataset. Data were binned on Olsen P and sand content because these variables were identified as highly influential by the XGBoost model and demonstrated causal relationship with soil P sorption capacity through causal inference analysis.”

Comment 37: Line 312-314: Explain why the Langmuir model underestimates these effects, likely due to its linear or simplified functional form. A one-sentence insight here enhances the interpretation without overexplaining.

Response 37: Agree. Thank you for your comment. The following text was added in the manuscript: “These results highlight the machine learning model’s enhanced sensitivity to key soil properties, particularly in capturing their nonlinear and interacting effects on phosphorus sorption. The Langmuir model’s underestimation is likely due to its simplified functional form, which does not account for such multivariate complexity. These findings underscore the advantage of data-driven approaches like XGBoost in modeling nutrient dynamics across diverse soil conditions.”

Comment 38: Line 317-322: The reiteration of model’s superior performance feels redundant. Consider merging the interpretive insights of this and the previous paragraph into a single cohesive reflection on the XGBoost model’s advantage in handling nonlinear effects.

Response 38: Agree. Thank you for your comment. The two paragraphs were merged in one paragraph as follows: “Applying the Langmuir isotherms to the extended dataset revealed a modest de-cline in P adsorption of 2.3% in the very high Olsen P group compared to the low Olsen P group (p = 0.005) (Figure 6), and an 11.9% reduction in the very high sand content group relative to the low sand group (p < 0.001) (Figure 7). In contrast, the multi-output XGBoost model estimated a substantially greater drop of 12.6% for Olsen P and 19.2% for sand content (both p < 0.001). These results highlight the machine learning model’s enhanced sensitivity to key soil properties, particularly in capturing their nonlinear and interacting effects on phosphorus sorption. The Langmuir model’s underestima-tion is likely due to its simplified functional form, which does not account for such multivariate complexity. These findings underscore the advantage of data-driven ap-proaches like XGBoost in modeling nutrient dynamics across diverse soil conditions.”

Comment 39: Line 340-341: The repeated emphasis on the XGBoost model’s responsiveness will be stronger if balanced with a brief inclusion of its potential limitations (e.g., dependence on a well-structured dataset or interpretability challenges in operational settings).

Response 39: Agree. Thank you very much for your constructive comment. The following text was added in the manuscript: “While the XGBoost model demonstrated strong responsiveness to key soil properties, its performance depends on the availability of a well-structured, representative dataset, and its complexity may present challenges for interpretability and direct implementation in operational agronomic settings.”

Comment 40. Line 341-343: The phrase “not surprising” should be reworded to sound more objective like “This is consistent with XGBoost’s well-established performance in high-dimensional regression tasks.”

Response 40: Agree. Thank you very much for your thoughtful comment. The phrase was replaced with: “This is consistent with XGBoost’s well-established performance in high-dimensional regression tasks”

Comment 41: Line 357-363: The claim that manganese and magnesium effects are likely indirect can be supported from a citation or brief mechanistic explanation (e.g., soil redox behavior, cation exchange processes). The interpretation is logical, but providing a reference or observed trend can add depth and support. Also, consider whether the indirect effects observed in SHAP but not in causal inference can hint at latent variable interactions, this might be worth mentioning cautiously.

Comment 42: Line 357-363: The claim that manganese and magnesium effects are likely indirect can be supported from a citation or brief mechanistic explanation (e.g., soil redox behavior, cation exchange processes). The interpretation is logical, but providing a reference or observed trend can add depth and support. Also, consider whether the indirect effects observed in SHAP but not in causal inference can hint at latent variable interactions, this might be worth mentioning cautiously.

Response 42: Agree. Thank you for your comment. The following text was added: “The positive association between manganese availability and P sorption capacity is likely explained by a combination of indirect and direct mechanisms. Manganese tends to be more available in acidic or reducing soil environments, which are also characterized by higher P sorption potential. Additionally, Mn oxides are known to directly enhance P retention through surface adsorption processes [11]. A comparable trend was observed for magnesium, which is typically more available in alkaline, calcareous soils, conditions that are also associated with increased P sorption, as illustrated in Figure 2d.”

Comment 43: Line 368-370: The phrase “strongly confirm” should be toned down to “further support,” as causal inference on observational data should be interpreted with some caution.

Response 43: Agree. The phrase strongly confirm was replaced by “further support,”

Comment 44: Line 376-378: Clarify what “adequately” means when stating that the Langmuir equation "adequately described" soil adsorption, perhaps by referencing the model fit or residuals. Also, briefly mention that the Langmuir model failed to capture the magnitude of variation seen in high-sand or high-Olsen-P groups (as shown earlier).

Response 44: Agree. Thank you very much for your thoughtful comment. The following text was added in the manuscript: “The results of the current study showed that the Langmuir isotherms adequately described the general shape of the P adsorption curves across soil types, reflecting the expected saturation behavior. However, as shown earlier (Figures 6 and 7), the model failed to capture the full magnitude of variation in P adsorption, particularly in soils with very high sand content or high initial Olsen P concentrations, where it substantially underestimated the decline in sorption observed in the extended dataset.”

Comment 45: Line 381-383: It’s better to mention that the model performed less consistently at higher equilibrium concentrations (e.g., Ce = 10 mg/L), as noted earlier. Also, stating whether the level of error (MAE = 26.5 mg/kg) is acceptable for agronomic decision-making in a real-world context.

Response 45: Agree. Thank you for your comment. The paragraph was rephrased as follows: “The final multi-output model achieved an overall mean absolute error (MAE) of 26.5 mg/kg and an R² score of 0.50 on the test set, indicating a more consistent and accurate performance at lower equilibrium concentrations (Ce = 1–6 mg/L), which are also more representative of typical soil solution P levels in agronomic conditions. At higher concentrations (e.g., Ce = 10 mg/L), greater prediction scatter was observed, re-flecting increased variability (Figure 5).”

Comment 46: Line 391-392: The phrase “It is not surprising...” should be replaced with a better and formal phrasing (e.g., “This result is expected given the flexibility of tree-based models”).

Response 46: Agree. Thank you for your comment. The phrase was replaced with the following sentence: “This result is expected given the flexibility of tree-based models to capture non-linear relationships [46].”

Comment 47: Line 415-417: the phrase “significantly outperforms” should be softened slightly to reflect the statistical results more cautiously (e.g., “demonstrated improved performance in capturing phosphorus adsorption variability)”. This avoids sounding overstated, especially given that R² = 0.50, suggesting moderate predictive power.

Response 47: Agree. Thank you for your comment. The phrase was replaced with the following sentence: “This study demonstrated that a multi-output XGBoost regression model demonstrated improved performance in capturing P adsorption variability compared to the classical Langmuir isotherm in predicting P adsorption across diverse soil types.”

Comment 48: Line 417-419: Add how the sensitivity (the model's responsiveness to Olsen P and sand content) supports site-specific fertilizer recommendations or reduced environmental phosphorus losses. A single sentence pointing to potential practical applications will ground the conclusion in agronomic relevance.

Response 48: Agree. Thank you for your comment. The last sentence in the conclusion section was rephrased as follows: “Finally, the model’s strong responsiveness to Olsen P and sand content provides a promising tool for P management, with the potential to improve fertilizer use efficiency across diverse soil conditions and reduce environmental risks from over-fertilization.”

Comment 49: Line 419-422: Briefly mention whether further validation (e.g., across other soil regions or under field conditions) is needed before operational deployment. This will show critical awareness of the model’s current scope and future potential.

Response 49: Agree. Thank you for your comment. The following sentence was added in the manuscript: “Further validation across diverse soil regions and under field conditions would be useful to confirm the model’s robustness and support its broader operational use in agronomic decision-making.”

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Reviewer 2 Report

Comments and Suggestions for Authors

This study introduces a machine learning approach based on the XGBoost regression model to predict soil phosphorus (P) adsorption dynamics, and compares it with the traditional Langmuir isotherm model. Through SHAP feature importance analysis and the causal discovery algorithm (DirectLiNGAM), the study confirms that initial Olsen P concentration and sand content are key causal factors in reducing soil P adsorption. The results demonstrate that the XGBoost model significantly outperforms the Langmuir model in capturing nonlinear relationships and multivariate interactions, particularly in predicting P adsorption in soils with high Olsen P and high sand content. The XGBoost model shows higher sensitivity, predicting adsorption reductions of 12.6% and 19.2%, respectively, compared to the Langmuir model's predictions of only 2.3% and 11.9%.

Specific Recommendations

This study first reviews existing research and then points out the limitations of traditional experimental methods such as the Langmuir and Freundlich isotherm models for describing phosphorus adsorption behavior. These models are based on chemical equilibrium theory and are suitable for simulating adsorption under ideal conditions. However, real soil systems are highly heterogeneous, and traditional methods exhibit significant shortcomings: firstly, their linear assumptions fail to capture the nonlinear adsorption characteristics of actual soils; secondly, the models struggle to integrate the synergistic effects of various soil physicochemical properties; and thirdly, their predictive accuracy is constrained by fixed parameters, making it difficult to adapt to different environmental conditions. 

To address these limitations, this study builds upon previous work by proposing an XGBoost model combined with SHAP interpretability analysis and DirectLiNGAM causal inference, achieving precise quantification of phosphorus adsorption dynamics. The article's structure is scientifically sound, with generally accurate expression of viewpoints, proper terminology, and coherent sentences. The connections between sections are relatively close, though there remain some minor issues that require further improvement. Specific recommendations are as follows:

 

  1. Lines 35-36: The statement "Most cultivated soils in Greece are calcareous and contain relatively high levels of calcium carbonate (CaCO3), which gives them a high P sorption capacity" should be supported by relevant references.
  2. Lines 48-50: The claim "Elevated P levels in water bodies contribute to eutrophication, which promotes unnaturally dense algal growth, commonly known as algal blooms" requires citation of supporting literature.

 

  1. Lines 50-52: The assertion "These blooms can harm both wildlife and humans, as certain algal species release dermatoxins, neurotoxins, cytotoxins, and hepatotoxins" needs to be substantiated with appropriate references.

 

  1. Lines 200-202: The statement "This finding aligns with the feature importance plot (Figure 1), where Olsen P and sand content are among the most significant predictors" is not clearly reflected in Figure 1. Please verify this correspondence.

 

  1. Lines 250-252: The reported R² value of 0.5 in Figure 5 should be corrected to 0.493 as shown in the figure. This indicates the model explains 49.3% of the target variable's variation, leaving 50.7% unexplained. Suggestions for improving model accuracy should be provided.

 

  1. Figure 1: The figure does not clearly display Olsen P as mentioned in lines 200-202. Please ensure consistency between the text and figure.

Author Response

We sincerely thank the reviewer for the thoughtful and constructive comments provided on our manuscript. The feedback has been very helpful in improving the clarity, scientific rigor, and overall quality of the paper. We have carefully addressed each point raised and made the necessary revisions accordingly. Below, we provide point-by-point responses to each comment, with explanations of how the manuscript was updated.

Comment 1: Lines 35-36: The statement "Most cultivated soils in Greece are calcareous and contain relatively high levels of calcium carbonate (CaCO3), which gives them a high P sorption capacity" should be supported by relevant references.

Response 1: Agree. Thank you very much for your comment. We added the following reference: “Sgouras, I.D.; Tsadilas, C.D.; Barbayiannis, N.; Danalatos, N. Physicochemical and Mineralogical Properties of Red Mediterranean Soils from Greece. Commun Soil Sci Plant Anal 2007, 38, 695–711, doi:10.1080/00103620701220593.”

Comment 2: Lines 48-50: The claim "Elevated P levels in water bodies contribute to eutrophication, which promotes unnaturally dense algal growth, commonly known as algal blooms" requires citation of supporting literature.

Response 2: Agree. Thank you very much for your comment. We added the following reference: “Correll, D.L. The Role of Phosphorus in the Eutrophication of Receiving Waters: A Review. J Environ Qual 1998, 27, 261–266, doi:https://doi.org/10.2134/jeq1998.00472425002700020004x.”

Comment 3: Lines 50-52: The assertion "These blooms can harm both wildlife and humans, as certain algal species release dermatoxins, neurotoxins, cytotoxins, and hepatotoxins" needs to be substantiated with appropriate references.

Response 3: Thank you for the comment. This sentence has been removed from the revised manuscript in response to a request by another reviewer.

Comment 4: Lines 200-202: The statement "This finding aligns with the feature importance plot (Figure 1), where Olsen P and sand content are among the most significant predictors" is not clearly reflected in Figure 1. Please verify this correspondence.

Response 4: Agree. Thank you for the comment. This sentence has been rephrased as follows: “This finding is consistent with the SHAP-based feature importance plot (Figure 1), where Olsen P and sand content rank among the top predictors of P adsorption.”

Comment 5: Lines 250-252: The reported R² value of 0.5 in Figure 5 should be corrected to 0.493 as shown in the figure. This indicates the model explains 49.3% of the target variable's variation, leaving 50.7% unexplained. Suggestions for improving model accuracy should be provided.

Response 5: Agree. Thank you very much for your comment. The following text was added to clarify and contextualize the reported R² value of 0.493: “The final multi-output model captured complex, nonlinear relationships between soil features and P adsorption across different Ce levels, achieving an overall MAE of 26.5 mg/kg and an R² score of 0.493 on the test set (Figure 5). An R² of 0.493, while moderate, is considered reasonable given the inherent variability and multivariate nature of soil properties. To the best of our knowledge, there are no previous studies that have validated Langmuir equations on unseen data using an independent test set. Traditional applications of the Langmuir model typically involve fitting to the entire dataset without assessing predictive performance, which limits direct comparison with modern data-driven approaches. While the model showed strong overall agreement between predicted and observed values, its performance was slightly weaker at higher equilibrium concentrations (e.g., Ce = 10 mg/L), as reflected by increased scatter in predictions. This conclusion is supported by the SHAP analysis, which revealed non-linear trends in key features such as Olsen P, sand content, and manganese (Figure 2), where the effect on P adsorption varied in intensity and direction depending on feature values.”

Comment 6: Figure 1: The figure does not clearly display Olsen P as mentioned in lines 200-202. Please ensure consistency between the text and figure.

Response 6: Agree. Thank you very much for your comment. We rephrased the sentence for clarity as follows: “This finding is consistent with the SHAP-based feature importance plot, where Olsen P (shown as P in Figure 1) and sand content rank among the top predictors of P adsorption.)

Reviewer 3 Report

Comments and Suggestions for Authors

This study compares the performance of a multi-output XGBoost regression model with classical Langmuir isotherms for predicting soil phosphorus (P) adsorption capacity. Overal manuscript quality is good, however author needs major revision: some comments are given below

While the introduction establishes the importance of P in agriculture, it lacks sufficient discussion of previous machine learning applications in soil science or P adsorption modeling specifically. Only few references are cited regarding P sorption modeling, and there's no systematic comparison of different modeling approaches previously used.

The gap between traditional isotherms and machine learning approaches is mentioned but not thoroughly justified with evidence from previous studies.

The rationale for selecting 147 samples from the larger dataset is not explained. How representative are these samples?

The train-test split (80-20) is mentioned, but cross-validation procedures are not described.

Some figures lack adequate resolution and clarity (especially Figure 3 with the DAGs).

P-values are reported inconsistently, and confidence intervals are missing for most comparisons.

The R² of 0.50 for the XGBoost model is moderate but not thoroughly discussed in context.

Add a comprehensive literature review table comparing previous P adsorption modeling approaches

Author Response

We thank the reviewer for their constructive and thoughtful feedback. We appreciate the comments, which helped us identify key areas to improve both the scientific depth and clarity of the manuscript. Below we provide detailed, point-by-point responses, outlining how each issue has been addressed in the revised version.

Comment 1: While the introduction establishes the importance of P in agriculture, it lacks sufficient discussion of previous machine learning applications in soil science or P adsorption modeling specifically. Only few references are cited regarding P sorption modeling, and there's no systematic comparison of different modeling approaches previously used.

Response 1: Agree. We thank the reviewer for this insightful comment. To the best of our knowledge, there are no prior studies specifically applying machine learning to model phosphorus sorption dynamics in soils. However, we have expanded the introduction to include relevant references on P sorption modeling using classical approaches, including the work of Barrow (1983, 2008), who has critically examined sorption curve behavior and the limitations of isotherm-based models. These citations help contextualize the need for data-driven, multivariate modeling approaches such as the one presented in this study. Additionally, we have briefly discussed the broader application of machine learning in soil phosphorus modeling to better clarify our contribution. We added the following references:

 “BARROW, N.J. On the Reversibility of Phosphate Sorption by Soils. Journal of Soil Science 1983, 34, 751–758, doi:https://doi.org/10.1111/j.1365-2389.1983.tb01069.x.”

Barrow, N. J. (2008). The description of sorption curves. European Journal of Soil Science, 59(5), 900–910. https://doi.org/https://doi.org/10.1111/j.1365-2389.2008.01041.x

The following text was added to support the application of machine learning in soil phosphorus modeling: “These results highlight the machine learning model’s enhanced sensitivity to key soil properties, particularly in capturing their nonlinear and interacting effects on phosphorus sorption. The Langmuir model’s underestimation is likely due to its simplified functional form, which does not account for such multivariate complexity. These findings underscore the advantage of data-driven approaches like XGBoost in modeling nutrient dynamics across diverse soil conditions.”

Comment 2: The gap between traditional isotherms and machine learning approaches is mentioned but not thoroughly justified with evidence from previous studies.

Response 2: Agree. Thank you very much for this comment. We added the following text in the introduction to further support the limitations of the Langmuir and Freundlich isotherms: “Many models have been developed to describe this process quantitatively, with the Langmuir and Freundlich equations being the most widely used [18–21]. However, real soil systems are inherently heterogeneous, both chemically and physically, which limits the applicability of these models in complex field environments. First, their underlying assumptions, such as homogeneity of adsorption sites and linearity or semi-empirical fitting, fail to capture the nonlinear and multivariate nature of actual soil P adsorption processes [20,21]. Second, these models do not effectively integrate the interactive effects of multiple soil properties (e.g., texture, pH, CaCO₃, organic matter), which collectively influence P dynamics. Third, their fixed-parameter structure constrains their predictive flexibility across diverse environmental contexts, making them less suited for data-driven or site-specific fertilizer management applications [9,19].”

Comment 3: The rationale for selecting 147 samples from the larger dataset is not explained. How representative are these samples?

Response 3: Agree. Thank you for your comment. The following text was added in the manuscript: “A total of 147 surface soil samples (0–15 cm) were selected from a larger soil database maintained by the Soil and Water Resources Institute (SWRI). The selection of the 147 soil samples aimed to ensure a broad representation of physicochemical variability found in Greek soils, particularly across different soil textures, pH levels, and calcium carbonate (CaCO₃) contents typical of Greek agricultural soils (Table 1).”

Comment 4: The train-test split (80-20) is mentioned, but cross-validation procedures are not described.

Response 4: Agree. Thank you for your comment. We added the following explanation regarding cross validation: “To reduce dimensionality and eliminate less informative features, we applied Recursive Feature Elimination (RFE) on this augmented dataset with a Random Forest Regressor as the base estimator [33,34]. This procedure was conducted on the training test using a fixed train-test split rather than cross-validation given the moderate dataset size. RFE recursively ranks and removes features based on their importance, ultimately selecting the most predictive subset for model training.”

Comment 5: Some figures lack adequate resolution and clarity (especially Figure 3 with the DAGs).

Response 5: Agree. Thank you for your comment. We have increased the resolution of Figure 3 to improve visual clarity and ensure that all elements, including node labels and directional arrows, are clearly visible.

Comment 6: P-values are reported inconsistently, and confidence intervals are missing for most comparisons.

Response 6: Agree. Thank you for your thoughtful comment. The following text was added in the materials and methods section: “The differences between means were assumed to be statistically significant at the 5% level.”

Comment 7: The R² of 0.50 for the XGBoost model is moderate but not thoroughly discussed in context.

Response 7: Agree. Thank you very much for your comment. The following text was added to clarify and contextualize the reported R² value of 0.493: “The final multi-output model captured complex, nonlinear relationships between soil features and P adsorption across different Ce levels, achieving an overall MAE of 26.5 mg/kg and an R² score of 0.493 on the test set (Figure 5). An R² of 0.493, while moderate, is considered reasonable given the inherent variability and multivariate nature of soil properties. To the best of our knowledge, there are no previous studies that have validated Langmuir equations on unseen data using an independent test set. Traditional applications of the Langmuir model typically involve fitting to the entire dataset without assessing predictive performance, which limits direct comparison with modern data-driven approaches. While the model showed strong overall agreement between predicted and observed values, its performance was slightly weaker at higher equilibrium concentrations (e.g., Ce = 10 mg/L), as reflected by increased scatter in predictions. This conclusion is supported by the SHAP analysis, which revealed non-linear trends in key features such as Olsen P, sand content, and manganese (Figure 2), where the effect on P adsorption varied in intensity and direction depending on feature values.”

Comment 8. Add a comprehensive literature review table comparing previous P adsorption modeling approaches

Response 8. Agree. Thank you very much for this constructive comment. We have added the following table in the manuscript:

Authors (Year)

Model Used

Key Findings

Olsen & Watanabe (1957) [12]

Langmuir, Freundlich

Introduced Langmuir model to estimate P adsorption maxima in soils.

Helyar et al. (1976) [13]

Langmuir

Studied phosphate adsorption behavior on Al oxides.

Nair et al. (1984) [31]

Standardized Langmuir protocol

Developed a standardized interlaboratory method for determining P sorption using Langmuir modeling.

Barrow (2008) [16]

Freundlich, Mechanistic models

Proposed improved description of sorption curves beyond isotherms.

Hussain et al. (2002) [47]

Langmuir, Freundlich

Compared isotherm performance under saline-sodic conditions.

Del Bubba et al. (2003) [20]

Langmuir

Estimated P adsorption maximum in sand filters using Langmuir.

Heredia & Fernández Cirelli (2007) [9]

Langmuir

Linked high P application to environmental risk based on sorption capacity.

Bolster & Sistani (2009) [14]

Langmuir

Investigated P sorption from animal manures; showed variability depending on manure type and application.

Dossa et al. (2008) [21]

Langmuir, Freundlich

Assessed impact of shrub residues on P sorption and desorption.

Lair et al. (2009) [48]

Langmuir

Investigated P sorption–desorption along soil weathering gradient.

Rossi et al. (2012) [49]

Langmuir (SWAT integration)

Tested Langmuir model performance within SWAT under high P conditions.

Dari et al. (2015) [19]

Langmuir, Freundlich

Proposed simplified method for estimating isotherm parameters.

Mihoub et al. (2016) [3]

Langmuir

Evaluated P sorption in calcareous soils and its role in sustainable P fertilizer management.

Yang et al. (2019) [44]

Freundlich

Showed influence of organic matter on P adsorption and desorption.

Wang et al. (2022) [30]

Modified Langmuir

Introduced a modified Langmuir equation to account for organic material influence on P adsorption in Mollisols.

Zawadzka et al. (2024) [18]

Langmuir

Applied Langmuir to model phosphate sorption in engineered media.

 

 

 

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The revised version has been improved with better readability. Moving MLR equation data to a table provides better comparisons across soil types. The caption of Table 6 is copied from Table 5, which should be corrected.

Author Response

Comments 1: The caption of Table 6 is copied from Table 5, which should be corrected.

Response 1: Thank you very much for your attentive comment. The caption in Table 6 has been replaced as follows:

Table 6. Coefficients of the multiple linear regression models predicting P adsorption for each soil texture class (Sandy, Loamy, Clayey). Ce is the equilibrium concentrations of P: 1, 2, 4, 6, and 10 mg/L. R² values indicate the goodness of fit for each model.

Reviewer 2 Report

Comments and Suggestions for Authors

Accept in current form.

Author Response

 We sincerely thank you for your positive evaluation and support of our work.

Reviewer 3 Report

Comments and Suggestions for Authors

Paper can be accepted.

Author Response

 We sincerely thank you for your positive evaluation and support of our work.

Back to TopTop