Data-Driven and Mechanistic Soil Modeling for Precision Fertilization Management in Cotton
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper is good but needs some explanation for the specific comments given in the attached file.
Comments for author File: Comments.pdf
Author Response
Reply to the Review Report 1
Comment 1: Novelty statement is missing based on the knowledge gap in current practices.
Response 1: Thank you for pointing this out. We agree with this comment. Therefore, we have added the following text in page 2 line 70:
The novelty of this study lies in the incorporation of mechanistic soil model outputs as features in ML algorithms, enabling a more holistic representation of the influence of soil variability on yield and improving prediction accuracy.
More specific objectives were to:
(i) Build an ML model for cotton yield prediction to select the optimum N fertilizer dose, and
(ii) Include soil model equations in the feature space of the training ML model for better capturing soil variability.
Comment 2: Why these (Random Forest (RF), XGBoost, and LightGBM) specific models were chosen? Proper explanation is required based on their strength and relevance to the cotton yield
Response 2: Thank you for pointing this out. We have added the following text in page 8 line 329: These three ML algorithms were selected because they currently achieve state-of-the-art performance within the ML framework and, according to numerous previous studies, outperform other methods in terms of accuracy.
Comment 3: Where did you get it from (the Feature importance plot)? Detailed explanation is required.
Response 3: Thank you for pointing this out. We have added the following text in page 9 line 380: The feature importance is presented in Figure 3 and it reflects the mean decrease in impurity (MDI), which quantifies how much each feature contributes to reducing the prediction error (i.e., variance) across all decision trees in the forest. During model training, the feature importance algorithm loops through every tree and recursively evaluates each split. At every decision node, it identifies the feature used for the split and calculates the resulting improvement in model performance. This improvement, weighted by the number of samples affected by the split, is added to that feature’s cumulative importance score. These contributions are summed across all splits in all trees of the forest. After training, the total importance scores are normalized so that they sum to 1. Each variable included in the model is thus assigned a relative importance value based on how much it helped reduce the prediction error across the ensemble.
Comment 3: There is no explanation about the interpretation of key features, particularly regarding interactions between nutrients (e.g., P and Zn).
Response 3: Thank you for pointing this out. We agree with this comment as there is no explanation for many of the interactions presented in the graphs. We have added the following text in page 12 line 438, 448, 455, 461: Notably, at higher K_rate levels (>40 kg/ha), the SHAP values tend to drop when accompanied by higher P_rate values (indicated by red points).
Interestingly, Figure 4d reveals an interaction of P availability with copper (Cu). At low P_availability, data points with low copper levels (blue points) are generally associated with more positive SHAP values. This probably reveals that growers applying high doses of P fertilizers may make intensive use of copper for the control of fungal diseases.
Regarding the interaction with copper, the color gradient reveals that higher Cu concentrations (red points) tend to correspond with slightly more positive SHAP values at lower Fe levels.
Thus, at low organic matter (blue points) there is obviously lower P availability, and the estimated P need is increased – a trend that is expected.
Comment 4: The factor of water was not considered in this paper as plant only absorb the nutrients in water soluble form. Therefore, availability of nutrients is increased in the presence of water.
Response 4: Thank you for pointing this out. We have added the following text in page 13 line 528: Furthermore, in the Drama region, where the study was conducted, growers typically apply sufficient irrigation to meet the crop’s water requirements. The area is generally not subject to water scarcity, and irrigation infrastructure ensures that water is not a limiting factor for cotton cultivation. Therefore, under the prevailing conditions, water availability is consistently sufficient, and its exclusion from the model would likely influence only the estimation of N losses through leaching, without affecting nutrient availability due to water deficiency.
Comment 5: The conclusion could be perceived as overly optimistic, given the study's limitations. Moderate the language to reflect the scope and limitations, emphasizing the model's potential rather than asserting its universal applicability.
Response 5: Thank you for pointing this out. We agree with this and therefore we deleted the last sentence of the conclusion in page 14 line 552
Reviewer 2 Report
Comments and Suggestions for AuthorsAttached
Comments for author File: Comments.pdf
Author Response
Reply to the Review Report 2
Comment 1: Add reference for line 36-37
Response 1: Thank you for pointing this out. We agree with this comment. Therefore, we have added the following text in page 1 line 37:
Tsaliki, E.; Loison, R.; Kalivas, A.; Panoras, I.; Grigoriadis, I.; Traore, A.; Gourlot, J.-P. Cotton Cultivation in Greece under Sustainable Utilization of Inputs. Sustainability 2024, 16, doi:10.3390/su16010347.
Comment 2: L 41-44: Break the sentence for better clarity
Response 2: Thank you for pointing this out. We agree with this comment. Therefore, we broke the sentence, and it became as follows:
While this approach may yield satisfactory results, it may lead to inefficiency of N ap-plication, This includes excessive fertilizer use, increased production costs, and heightened environmental risks due to N losses [5,6].
Comment 3: L 41-44: Discuss some models in detail. Add the shortcomings and advantages of these models. And then add the research gaps and need of the hybrid approach
Response 3: Thank you for pointing this out. We agree with this comment. Therefore, we added the following paragraph in page 2 line 60:
Moreover, the predictive accuracy of the cotton growth simulation models is often lim-ited by outdated physiological parameters, or incomplete representation of biological processes. For instance, they lack integration with real-time data and cannot easily adapt to the change of physiological parameters or fertilizer inputs, like new varieties, new types of fetilizers etc. As our understanding of plant and soil interaction evolves, it becomes increasingly difficult to incorporate every relevant mechanism into a single mechanistic model without making it unmanageable [15,16]. These limitations point to a research gap in developing a system that balances accuracy, usability, and adaptability. To address this, our study explores a hybrid modeling approach that combines the ex-planatory power of a mechanistic soil model with the pattern-recognition strengths of ML. This approach aims to improve prediction performance while remaining responsive to real-world data constraints, as machine learning models can be easily retrained with new data and stay updated with the evolving dynamics of cotton cropping. This finally will offer a more practical decision-support framework for site-specific N management.
Comment 4: L 41-44: Discuss some characteristics of the study area like climate, temperature etc.
Response 4: Thank you for pointing this out. We agree with this comment. Therefore, we added the following paragraph in page 3 line 96:
The study area is in the Drama Plain, Greece Northern Greece, where cotton is a major crop, cultivated on approximately 3,573 hectares in 2023 [20]. The region exhibits a temperate Mediterranean climate, with cool and wet springs and hot, dry summers. Rainfall is scarce during summer, making irrigation a critical factor for sustaining crop productivity. However, growers typically apply sufficient irrigation to meet the crop’s water requirements.
Comment 5: L 84-85 Add the reference for this information
Response 5: It is the affiliation (Hellenic Agricultural Organization (DIMITRA)) of most of the authors of the paper and it is mentioned at the beginning of this article.
Comment 6: Check the format of the equations
Response 6: Thank you for your observation. We have carefully reviewed all equations and ensured that they follow the required formatting
Comment 7: In 2.4.4. and 2.4.5 check the spacing for the headings
Response 7: Thank you for pointing this out. We have made the necessary changes.
Comment 8: Check the format of Table 1
Response 8: Thank you for pointing this out. We have changed the format of Table
Comment 9: L 41-44: Break the sentence for better clarity
Response 9: Thank you for pointing this out. We agree with this comment. Therefore, we added the following text in page 9 line 359, 363, 369 and 370:
RF was implemented using the scikit-learn library
XGBoost, implemented via the xgboost Python library
LightGBM was implemented using the lightgbm library [53].
All models were trained using 5-fold cross-validation on the training set to ensure robustness, and model performance was evaluated on the testing set using as a metric the Mean Absolute Error (MAE).
Comment 10: Add some recap in results and after that start discussing the results
Response 10: Thank you for pointing this out. We agree with this comment. Therefore, we added the following paragraph in page 9 line 391:
A cotton yield prediction model was developed using three ML algorithms: RF, XGBoost, and LightGBM. Among them, the RF model achieved the highest accuracy, with the lowest MAE. The CLRE vegetation index showed the strongest correlation with actual yield, confirming that yield data obtained by the growers were reliable. Feature importance analysis and SHAP dependence plots revealed that N application rate was the most influential factor in predicting yield, followed by variables related to phos-phorus and potassium availability. Also, trace elements, such as zinc, copper, and iron, followed in importance. The analysis also highlighted key nutrient interactions and thresholds beyond which additional fertilization no longer provided yield benefits and could potentially have adverse effects.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe paper presents a hybrid approach integrating machine learning (ML) with mechanistic soil modeling to predict cotton yield and optimize nitrogen (N) fertilization. Below are the present Reviewer's outlined strengths, weaknesses, and potential areas for improvement.
Strengths
a. Integrating data-driven ML techniques with mechanistic soil modeling: it is a novel and promising approach. This combination leverages the strengths of both methods: ML's ability to capture complex, non-linear relationships in data, and mechanistic modeling's ability to simulate soil processes and nutrient interactions.
b. Comprehensive dataset: the study utilizes a robust dataset from 309 cotton field parcels, including soil properties, yield data, and farming practices. This extensive dataset enhances the reliability and generalizability of the findings.
c. Focus on precision agriculture: the study addresses a critical issue in agriculture—excessive N fertilization—and proposes a method to optimize N use, which can lead to cost savings for farmers and reduced environmental impact.
d. Use of advanced ML algorithms: the paper employs state-of-the-art ML algorithms (Random Forest, XGBoost, and LightGBM) and uses SHAP (SHapley Additive exPlanations) analysis for interpretability. This allows for a deeper understanding of the factors influencing cotton yield.
e. Practical implications: the study provides actionable insights for farmers, such as the optimal N rate (around 200 kg N/ha) and the importance of phosphorus (P) availability. These findings can help farmers make more informed decisions about fertilization.
Weaknesses and areas for improvement
f. Limited consideration of water management: the study acknowledges that water is a major determinant of N availability but does not incorporate water management into the model. Including water-related variables could improve the model's accuracy and practical applicability.
g. Lack of validation in different regions: the study is conducted in a specific region (Drama Plain, Greece). While the results are promising, the model's performance should be validated in other cotton-growing regions with different climatic and soil conditions to ensure its broader applicability. Otherwise, this limitation must be clearly underlined in the text.
h. Complexity of the soil model: the mechanistic soil model is highly detailed, which may limit its practical use by farmers who may not have access to the necessary data or expertise. Simplifying the model or providing user-friendly tools could enhance its adoption.
j. Potential overfitting: while the study uses cross-validation and hyperparameter tuning to mitigate overfitting, the complexity of the model and the relatively small dataset (309 field parcels) could still pose a risk of overfitting. Further validation with independent datasets is recommended.
k. Environmental impact: although the study aims to reduce excessive N fertilization, it does not extensively discuss the environmental benefits of optimized N use, such as reduced nitrate leaching or greenhouse gas emissions. Including an environmental impact analysis could strengthen the paper's contribution to sustainable agriculture.
l. Suggestions for future work: future studies should consider integrating water management variables, such as irrigation practices and soil moisture levels, to improve the model's accuracy and relevance. Additionally, validating the model in different cotton-growing regions with varying climatic and soil conditions would enhance its generalizability and robustness. The development of user-friendly tools or software that farmers can easily use to apply the model's recommendations would facilitate its adoption in real-world farming practices; this should be discussed. Future research should also include a detailed analysis of the environmental benefits of optimized N fertilization, such as reduced nitrate leaching, lower greenhouse gas emissions, and improved soil health. Finally, conducting long-term field trials to validate the model's predictions and assess its impact on yield stability and soil health over multiple growing seasons would provide valuable insights.
Overall, considering all previously reported points, the paper presents a significant advancement in precision agriculture by combining ML with mechanistic soil modeling to optimize N fertilization in cotton production. The study's findings have practical implications for farmers and contribute to the broader goal of sustainable agriculture. However, addressing the limitations and incorporating additional variables, such as water management, would further enhance the model's accuracy and applicability. The paper lays a strong foundation for future research in precision fertilization management and has the potential to make a meaningful impact on agricultural practices. Thus, major revisions are suggested.
Comments on the Quality of English LanguageThe English is generally fine, but there are a few areas where the language could be improved for clarity. For example, some sentences are overly complex and could be simplified to make the manuscript more accessible to a broader audience.
Author Response
Thank you very much for your comments.
I have included a paragraph as suggestions for future work on page 15 line 567. The paragraph includes the points you recommended, and it reads as follows:
Future studies should consider integrating additional variables related to water management, such as irrigation practices, to further improve model accuracy. Validating the model across different cotton-growing regions with diverse climatic and soil con-ditions will also enhance its generalizability and robustness. Importantly, to ensure practical adoption, the model will be deployed as a software tool at the Soil and Water Resources Institute of the Hellenic Agricultural Organization (DIMITRA), where it will be used in combination with soil analysis data to support farmer decision-making. Further research should also investigate the environmental benefits of optimized ni-trogen fertilization, particularly the model’s impact on reducing nitrate leaching and improving soil health. In addition, long-term field trials are needed to validate model predictions and assess their influence on yield stability and sustainability over multiple growing seasons. Finally, as interest in yield mapping for cotton harvesters will grow, more yield data will become available in the future, as has already occurred with rice [7,61]. This will facilitate the development of advanced ML models leveraging big data and ultimately provide significant improvement in our ability to engineer effectively N management for cotton crops.
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsAuthors improved the manuscript but need to rewrite the novelty statement and objectives. There is a need of some explanation about selection of models used in the study. Specific comments are given in the attached file.
Comments for author File: Comments.pdf
Author Response
Comment 1: Please rewrite the novelty statement. The word "novelty" should not necessarily come in the statement as novelty is judged based on how different is your study from the previous studies. Further, try to rewrite the objectives in paragraph form rather than writing in points.
Response 1: Thank you for pointing this out. We agree with this comment. Therefore, we have rewritten the objectives as follows (I have included with blue color the new text): This study explores the incorporation of mechanistic soil model outputs as features in ML algorithms, enabling a more holistic representation of the influence of soil variability on yield and improving prediction accuracy. The primary objective is to develop an ML model for cotton yield prediction to select the optimum N fertilizer dose. The study also explores the inclusion of soil model equations in the feature space of the training ML model for better capturing soil variability.
Comment 2: Please refer to those studies and mention how they outperform other models? There should be a valid scientific/technical reason.
Response 2: Thank you for pointing this out. We agree with this comment. Therefore, we have added the following text based on the study of Swart-Ziv and Armon (2022). These three ML algorithms were selected because tree ensemble models are usually recommended for regression problems with tabular data. According to Swartz-Ziv and Armon [50] a systematic comparison of XGBoost with deep learning models on various datasets with tabular data showed that the XGBoost outperformed the deep models. The authors explained that in cases where the deep models performed better compared to tree models, there has been probably a more extensive search of hyperparameter search. Even in such cases, tree-based models generally require significantly less hyperparameter tuning than deep learning models. Moreover, although the dataset consists of 309 cotton field parcels—which reflects the considerable scale and effort involved in this study—it remains relatively small in the context of deep learning requirements. As such, deep learning approaches would not be suitable or efficient for this application.
Reviewer 2 Report
Comments and Suggestions for AuthorsDear Authors,
You have addressed the comments and improved the manuscript significantly. It can be accepted as it is.
Thank You
Author Response
Dear Reviewer,
Thank you very much for your time and positive evaluation of our manuscript which helped us improve the quality of the work.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors have significantly improved the paper, addressing many of the concerns raised in the previous review. The manuscript now includes a clearer discussion of the limitations and future work, making the study's scope and potential impact more transparent. The SHAP analysis and feature importance plots provide deeper insights into nutrient interactions, enhancing the paper's interpretability. The paper can be accepted in its current form.
Author Response
Dear Reviewer,
Thank you very much for your time and positive evaluation of our manuscript which helped us improve the quality of the work.