Next Article in Journal
Effect of Nitrogen, Phosphorus and Potassium Fertilization Management on Plant and Soil Properties in Grasslands with Varying Salinity–Alkalinity
Next Article in Special Issue
Integrating Envirotyping and Phenomics for AI-Enabled Multi-Environment Genomic Prediction in Crop Breeding
Previous Article in Journal
Effectiveness of Foliar Silicon Fertilisation on Quality Attributes of Highbush Blueberry (Vaccinium corymbosum L.)
 
 
Article
Peer-Review Record

High-Throughput Evaluation of Cotton Drought Tolerance Using UAV Multispectral Imagery and XGBoost-Based Machine Learning

Agronomy 2026, 16(5), 526; https://doi.org/10.3390/agronomy16050526
by Fuxiang Zhao 1,†, Tao Yang 1,2,†, Wei Wang 3,†, Wanli Han 1, Gang Wang 1, Jinxin Qiao 1, Xianhui Kong 1, Li Liu 1, Aijun Si 1, Fanlin Wang 2, Xuwen Wang 1,*, Xiyan Yang 4,* and Yu Yu 1,*
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Agronomy 2026, 16(5), 526; https://doi.org/10.3390/agronomy16050526
Submission received: 29 December 2025 / Revised: 19 February 2026 / Accepted: 25 February 2026 / Published: 28 February 2026

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Below are my key concerns and suggestions:

This manuscript presents a well-designed and comprehensive study integrating UAV-based multispectral imagery with machine learning models to evaluate drought tolerance in cotton. The work is technically sound, clearly structured, and supported by a large germplasm panel (225 accessions), making it a valuable contribution to high-throughput phenotyping and digital agriculture. The experimental design, data acquisition workflow, and modeling framework are generally appropriate, and the results are convincingly presented. However a few minor issues should be addressed to further improve clarity, reproducibility, and presentation quality.

  1. The study relies on data from a single location and a single growing season, which limits the generalizability of the proposed model. The authors should clearly acknowledge this limitation and, if possible, include additional validation (e.g., year-wise or block-wise cross-validation) to better assess model robustness and reduce the risk of overfitting.
  2. Key details regarding hyperparameter optimization, feature selection thresholds, and model reproducibility are insufficiently described. Providing explicit hyperparameter settings and justification for model choices would improve methodological rigor.
  3. Please ensure consistent use of terms such as “XGBoost” vs. “XGBoosting” and “drought stress (DS)” vs. “drought treatment” throughout the manuscript, including figures and captions.
  4. While the authors mention 5-fold cross-validation, a brief clarification on whether cross-validation was applied only during training or also repeated across random splits would improve transparency and reproducibility.
  5. Figures 5 is information-dense. Increasing font sizes, improving axis labeling consistency, and simplifying legends would enhance readability, particularly for print and mobile viewing.
  6. The discussion around the lower contribution of NDVI is appropriate, but a short clarification distinguishing saturation effects versus soil background influence would strengthen interpretation.
  7. The construction and interpretation of the comprehensive drought tolerance index (D) should be explained more clearly, particularly its biological meaning and sensitivity to individual traits. A brief comparison with commonly used drought indices would strengthen the discussion.
  8. The discussion section should be more critical, particularly regarding model transferability to other environments, soil types, and growth stages.
  9. You could probably include below related references,

a.) Deploying a Proximal Sensing Cart to Identify Drought-Adaptive Traits in Upland Cotton for High-Throughput Phenotyping https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2018.00507/full 

b.) High-Throughput Screening of Wheat Genotypes for Drought Tolerance Using Aerial Thermal Imagery https://ieeexplore.ieee.org/abstract/document/11136381 

  1. A careful proofreading is recommended to correct minor grammatical errors and spacing issues (e.g., inconsistent spacing around units, subscripts, and symbols in equations and tables).

I am recommending minor revision, primarily focused on improving clarity and presentation suitable for publication. Addressing these points would substantially improve the scientific rigor and practical relevance of the study.

Author Response

Comments 1: The study relies on data from a single location and a single growing season, which limits the generalizability of the proposed model. The authors should clearly acknowledge this limitation and, if possible, include additional validation (e.g., year-wise or block-wise cross-validation) to better assess model robustness and reduce the risk of overfitting.

 

Response 1: Thank you for your insightful comment regarding the limitation of our study. We sincerely appreciate this valuable feedback. As suggested, we have revised the Conclusion section (lines 544–547)) to explicitly acknowledge that the current model was developed based on data from a single location and a single growing season, which may constrain its generalizability.

We would also like to note that follow-up experiments are already underway to evaluate the model’s performance across diverse environments, soil types, and growth stages. These efforts aim to enhance the robustness and transferability of our approach, which we believe is essential for reliable and accurate assessment of cotton drought tolerance in practical breeding programs.

 

Comments 2: Key details regarding hyperparameter optimization, feature selection thresholds, and model reproducibility are insufficiently described. Providing explicit hyperparameter settings and justification for model choices would improve methodological rigor.

 

Response 2: Thank you for your valuable comments and constructive suggestions, which have greatly helped improve the rigor and completeness of our manuscript. We have carefully addressed the issues raised regarding hyperparameter optimization, feature selection thresholds, and model reproducibility, and the detailed modifications are as follows(lines 306331):

To achieve precise prediction of the cotton drought tolerance index (D)—a continuous phenotypic trait—we framed the problem as a supervised regression task and conducted a comprehensive evaluation of four classical machine learning regression algorithms: Line-ar Regression (LR), k-Nearest Neighbors (KNN), Light Gradient Boosting Machine (LGBM), and XGBoost. Model training and evaluation were performed on a carefully curated da-taset, with samples randomly partitioned into training (70%) and testing (30%) sets using a fixed random seed (random_state=42) to ensure reproducibility. To optimize hyperpa-rameters while mitigating overfitting, 5-fold cross-validation was applied exclusively to the training set (via GridSearchCV in scikit-learn v1.7.2), with the root mean square error (RMSE) minimized as the primary tuning objective. This procedure ensured no data leak-age, preserving the test set as an independent benchmark for final model assessment. The hyperparameter settings for the different models are as follows:

- LR: fit_intercept=True, normalize=False;

- KNN: n_neighbors=7, metric='manhattan';

- LGBM: learning_rate=0.08, max_depth=5, n_estimators=500, subsample=0.8;

-XGBoost: learning_rate=0.1, max_depth=6, n_estimators=600, colsam-ple_bytree=0.7, reg_alpha=0.1, reg_lambda=0.2.

The 5-fold cross-validation was stratified by drought tolerance grade (Groups I–VI) to maintain consistent grade distribution across folds, avoiding bias from unbalanced class distribution. All analyses were performed using Python 3.9.16, with key libraries and versions: scikit-learn 1.7.2, XGBoost 2.0.3, LightGBM 4.1.0, Pandas 2.1.4, NumPy 1.26.0. Through extensive comparative analysis across multiple experimental itera-tions, the algorithm with the highest suitability for predicting the D value was pin-pointed, establishing a sturdy and reliable modeling framework for future cotton drought tolerance assessments. The detailed implementation workflow is visually pre-sented in Figure 2C.

All modifications have been integrated into the corresponding sections of the manuscript. These changes effectively improve the methodological rigor and reproducibility of the study.

Thank you again for your careful review and valuable suggestions.

 

Comments 3: Please ensure consistent use of terms such as “XGBoost” vs. “XGBoosting” and “drought stress (DS)” vs. “drought treatment” throughout the manuscript, including figures and captions.

 

Response 3: Thank you for pointing out this important issue regarding terminological consistency. We have carefully revised the manuscript to ensure uniform usage of technical terms. Specifically, “XGBoosting” has been corrected to “XGBoost” on lines 427, 450, 452, and 454, and “drought treatment” has been changed to “drought stress (DS)” on line 499. These adjustments have been applied consistently throughout the main text, figures, and captions to improve clarity and readability.

 

Comments 4: While the authors mention 5-fold cross-validation, a brief clarification on whether cross-validation was applied only during training or also repeated across random splits would improve transparency and reproducibility.

 

Response 4: To optimize model performance and minimize the risk of overfitting, 5-fold cross-validation was exclusively applied to the training set for hyperparameter tuning (implemented via GridSearchCV in scikit-learn v1.7.2), with the goal of minimizing the root mean square error (RMSE) on the validation folds. This cross-validation step was restricted to the training set to avoid data leakage and ensure the test set remained an independent evaluation benchmark(lines 340354).

 

Comments 5: Figures 5 is information-dense. Increasing font sizes, improving axis labeling consistency, and simplifying legends would enhance readability, particularly for print and mobile viewing.

Response 5: Thank you for your helpful suggestion regarding Figure 5. In response, we have revised the figure (as noted on line 489) to improve its clarity and accessibility. Specifically, we have increased font sizes, standardized axis labels for consistency, and simplified the legend to reduce visual clutter. These adjustments are intended to enhance readability, especially for print and mobile viewing formats.

 

Comments 6: The discussion around the lower contribution of NDVI is appropriate, but a short clarification distinguishing saturation effects versus soil background influence would strengthen interpretation.

Response 6: Thank you for this constructive suggestion. We agree that a clearer distinction between saturation effects and soil background influence would improve the interpretation of NDVI’s limited contribution in our analysis. In response, we have added a brief clarification in the Discussion section (lines 524–528) to explicitly differentiate these two factors: under severe drought stress, NDVI may suffer from saturation due to reduced sensitivity to further declines in canopy greenness at low vegetation cover, while in early growth stages or sparse canopies, its performance can be confounded by soil background reflectance. This refinement helps contextualize why other vegetation indices (e.g., those less sensitive to soil or designed for stressed conditions) outperformed NDVI in our study.

 

Comments 7: The construction and interpretation of the comprehensive drought tolerance index (D) should be explained more clearly, particularly its biological meaning and sensitivity to individual traits. A brief comparison with commonly used drought indices would strengthen the discussion.

Response 7: Thank you for your insightful comment regarding the comprehensive drought tolerance index (D). We appreciate the suggestion to provide a clearer explanation of its construction, biological relevance, and sensitivity to individual traits.

In response, we have expanded the description of index D in the manuscript (lines 336–339) to clarify that it integrates multiple drought-responsive physiological and agronomic traits into a single synthetic metric using a weighted scoring approach. This index has been previously validated across various crops—including cotton—for its ability to reflect overall drought tolerance and has successfully supported the identification of associated quantitative trait loci (QTLs) or candidate genes (see newly added references: e.g., Zhang et al., Genome-wide association and differential expression analysis of salt tolerance in Gossypium hirsutum L. at the germination stage; Li et al., Comprehensive evaluation of sea-island cotton germplasm under natural composite salt stress; Wang et al., Drought tolerance evaluation of upland × sea-island cotton recombinant inbred lines during flowering and boll-setting stage).

We acknowledge that, in the current study, our primary focus is on the development and evaluation of machine learning–based phenotyping models rather than on the detailed agronomic or genetic interpretation of index D itself. Nevertheless, we agree that placing D in the context of commonly used drought indices would enrich the discussion. While such a comparative analysis falls beyond the scope of this work, we now briefly note this point as a direction for future research.

Thank you again for helping us strengthen the rigor and clarity of our methodology.

 

Comments 8: The discussion section should be more critical, particularly regarding model transferability to other environments, soil types, and growth stages.

Response 8: Thank you very much for raising this critical and scientifically important point. We fully acknowledge the limitations regarding model transferability, and we appreciate your suggestion to address this more explicitly in the Discussion section.

In response, we have revised the manuscript (lines 206–507) to provide a more critical discussion of the current model’s applicability. Specifically, we emphasize that our model was developed and validated using data from a single location, soil type, and specific growth stage (flowering and boll-setting), which inherently restricts its generalizability. We clearly state that environmental variability—such as differences in climate, soil texture, water-holding capacity, and crop developmental dynamics—can significantly influence canopy spectral responses and, consequently, model performance.

We also highlight that ongoing and future work is actively expanding validation across diverse agroecological settings, soil types, and additional phenological stages to assess and enhance model robustness. This iterative validation is essential for developing truly scalable and transferable high-throughput phenotyping tools for cotton breeding programs under real-world conditions.

Thank you again for prompting us to strengthen this crucial aspect of our discussion.

 

Comments 9: You could probably include below related references,

a.) Deploying a Proximal Sensing Cart to Identify Drought-Adaptive Traits in Upland Cotton for High-Throughput Phenotyping https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2018.00507/full.

b.) High-Throughput Screening of Wheat Genotypes for Drought Tolerance Using Aerial Thermal Imagery https://ieeexplore.ieee.org/abstract/document/11136381.

Response 9: Thank you very much for your recommendation. We greatly appreciate the suggested references, which are highly relevant to our study. We have now carefully reviewed and formally included them in the reference list of the revised manuscript to strengthen the scientific context and support our methodological framework.

 

Comments 10: A careful proofreading is recommended to correct minor grammatical errors and spacing issues (e.g., inconsistent spacing around units, subscripts, and symbols in equations and tables).

Response 10: Thank you for this helpful suggestion. We sincerely appreciate the attention to detail. In response, we have conducted a thorough proofreading of the entire manuscript, with particular focus on correcting minor grammatical errors and ensuring consistent formatting. This includes standardizing spacing around units, properly formatting subscripts and superscripts in variables and chemical formulas, and verifying symbol consistency in equations and tables. We believe these revisions have significantly improved the clarity and professionalism of the text.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript by Zhao et al. addresses the application of multispectral imaging for detecting drought stress in cotton plants. The work compared several directly measured canopy-level phenotypic data with UAV-based multispectral imaging of 225 cotton accessions. From the statistical analysis and modeling of canopy-level phenotypic data vs. 16 vegetation indices derived from multispectral images, they identified the GNDVI, NGRVI, and NDRE indices as the most sensitive to reflecting drought stress. The work is correctly done, and the developed model/indices have a good predictive capacity under the studied conditions. The manuscript can be recommended for publication after corrections in light of the comments listed below:

 

Comments:

1, While the identified three vegetation indices indeed reflect reasonably well the consequences of the experienced drought stress, it is not demonstrated that these features are specific to water limitation. GNDVI and NGRVI reflect the amount of chlorophyll (Chl) in the canopy through the green reflectance, and NDRE is related mainly to the Chl amount through red-edge absorption. In other words, the most sensitive vegetation indices reflect drought-induced decrease of Chl. There is no doubt that this happens under prolonged drought stress; however, it can be induced by several other environmental stress factors, such as high light intensity,  UV exposure, nutrient limitation, etc. Therefore, although the described method can be suitable for detecting drought stress when monitoring the same area, the changes of the described indices are not diagnostic themselves for drought in general.

2, The practical benefit of the UAV-based multispectral monitoring would be an early warning capability for the development of drought (or other stress factors). However, from the description of the work it is not clear when the UAV-based image collections were done relative to the drought stress treatment, compared to its start or end? Therefore, the early warning potential of the changes in the sensitive indices is not clear. The timing of the UAV flights, relative to the timing of the drought stress treatment, should be specified.

3, The statement in the text, which is taken from the remote sensing literature, that the chlorophyll amount reflects photosynthetic capacity, is not generally true. The vast majority of Chls is located in the light-harvesting antenna complexes, whose size can significantly change in response to light conditions without affecting the overall photosynthetic capacity; it is actually a regulatory mechanism of light acclimation. For example, the same plant species can have much more chlorophyll under shade than under light-exposed conditions without a significant difference in their photosynthetic capacity. This does not affect the conclusions of the work, just a note that Chl amount is not a general proxy for photosynthetic activity.

 4, The definition of the NGRVI index, which appears to be the most sensitive vegetation index, is not listed in section 2.5.

5, The definition/explanation of IncMSE % is not provided in the text. It is not trivial for readers who are non-experts in statistical modelling.

Author Response

Comments 1: While the identified three vegetation indices indeed reflect reasonably well the consequences of the experienced drought stress, it is not demonstrated that these features are specific to water limitation. GNDVI and NGRVI reflect the amount of chlorophyll (Chl) in the canopy through the green reflectance, and NDRE is related mainly to the Chl amount through red-edge absorption. In other words, the most sensitive vegetation indices reflect drought-induced decrease of Chl. There is no doubt that this happens under prolonged drought stress; however, it can be induced by several other environmental stress factors, such as high light intensity,  UV exposure, nutrient limitation, etc. Therefore, although the described method can be suitable for detecting drought stress when monitoring the same area, the changes of the described indices are not diagnostic themselves for drought in general.

Response 1: Thank you very much for raising this important and insightful point regarding the specificity of the selected vegetation indices (VIs) to drought stress.

We fully agree that GNDVI, NGRVI, and NDRE are primarily sensitive to canopy chlorophyll content, which can be influenced not only by water deficit but also by other abiotic stressors such as high light intensity, UV radiation, or nutrient deficiencies. As you correctly note, these indices are not inherently diagnostic of droughtper se, but rather reflect physiological changes—such as chlorophyll degradation—that commonly occur under prolonged water stress.

In our study, however, the experimental design was carefully controlled to isolate the effect of water limitation. Specifically, the well-watered control (CK) and drought-stressed (DS) treatments were established in the same field block to minimize spatial variability in soil properties and microclimate. Soil moisture was rigorously monitored and maintained at distinct levels between CK and DS throughout the flowering and boll-setting stage (Section 2.1, revised), and significant differences in volumetric water content were confirmed between treatments. Furthermore, we conducted t-tests comparing 15 key agronomic and physiological traits between CK and DS (Table 2), all of which showed statistically significant differences (p < 0.01), strongly supporting that the observed canopy-level spectral changes were driven by imposed drought stress rather than confounding factors.

That said, we acknowledge that in uncontrolled or heterogeneous field conditions—where multiple stresses may co-occur—the interpretation of these VIs as drought-specific indicators would require additional contextual information (e.g., soil moisture data, weather records, or complementary stress markers). We have now clarified this nuance in the Discussion, emphasizing that while our model is effective for detecting drought-induced canopy stress under controlled experimental conditions, its application in broader agricultural settings should account for potential non-drought drivers of chlorophyll variation.

We appreciate your comment, which has helped us better articulate the scope and limitations of our approach.

 

Comments 2: The practical benefit of the UAV-based multispectral monitoring would be an early warning capability for the development of drought (or other stress factors). However, from the description of the work it is not clear when the UAV-based image collections were done relative to the drought stress treatment, compared to its start or end? Therefore, the early warning potential of the changes in the sensitive indices is not clear. The timing of the UAV flights, relative to the timing of the drought stress treatment, should be specified.

Response 2: Thank you for this important comment regarding the timing of UAV data acquisition relative to the drought stress treatment. We agree that clarifying this temporal relationship is essential for evaluating the early warning potential of the proposed approach.

In response, we have revised the manuscript (lines 193206) to explicitly specify the experimental timeline: during the flowering and boll-setting stage, the well-watered control (CK) received regular irrigation every 10 days (10 hours per event), whereas the drought-stressed (DS) treatment underwent two defined periods of water deficit. The soil percent moisture content of the CK and DS treatments was measured using the five-point sampling method, which was 38.65 % and 23.22 % respectively (0 - 60cm of soil). Critically, UAV-based multispectral data collection was conducted three days before the end of the water-deficit period, capturing canopy responses while plants were still under active drought stress—yet before irreversible damage occurred.

This timing demonstrates that the observed spectral changes in GNDVI, NGRVI, and NDRE reflect in-season, pre-symptomatic or early-stage physiological responses to water limitation, rather than post-stress recovery effects. While our current design confirms the sensitivity of these indices to ongoing drought, we acknowledge that true “early warning” capability—i.e., prediction prior to significant soil moisture depletion—would require more frequent monitoring at earlier stages of stress development. We have added a brief note on this point in the Discussion as a direction for future operational deployment.

Thank you again for prompting us to clarify this key aspect of our methodology.

 

Comments 3: The statement in the text, which is taken from the remote sensing literature, that the chlorophyll amount reflects photosynthetic capacity, is not generally true. The vast majority of Chls is located in the light-harvesting antenna complexes, whose size can significantly change in response to light conditions without affecting the overall photosynthetic capacity; it is actually a regulatory mechanism of light acclimation. For example, the same plant species can have much more chlorophyll under shade than under light-exposed conditions without a significant difference in their photosynthetic capacity. This does not affect the conclusions of the work, just a note that Chl amount is not a general proxy for photosynthetic activity.

 

Response 3: Thank you for this scientifically precise and valuable clarification regarding the relationship between chlorophyll content and photosynthetic capacity. We sincerely appreciate your insight.

In response, we have revised the relevant statement in the manuscript (line 111) to avoid overgeneralizing the role of chlorophyll as a direct proxy for photosynthetic activity. The text now acknowledges that while chlorophyll is essential for light capture, its concentration—particularly in the light-harvesting antenna complexes—can vary significantly in response to environmental conditions such as light intensity, often as part of a photoprotective or acclimatory strategy, without necessarily altering the plant’s maximum photosynthetic capacity.

We agree that this distinction does not undermine our core findings—since our focus is on using vegetation indices as indicators ofstress-induced physiological changes rather than direct estimators of photosynthesis—but it is important to maintain conceptual accuracy in interpreting spectral signals. Thank you again for helping us improve the scientific rigor of our discussion.

Comments 4: The definition of the NGRVI index, which appears to be the most sensitive vegetation index, is not listed in section 2.5.

Response 4: Thank you for pointing out this omission. We appreciate your careful reading of the manuscript.

In response, we have added the definition of the NGRVI (Normalized Green Red Vegetation Index) to Table 2 (line 277), as referenced in Section 2.5. This addition ensures that all vegetation indices used in the analysis are properly defined and readily accessible to readers. Thank you again for your helpful comment.

 

Comments 5: The definition/explanation of IncMSE % is not provided in the text. It is not trivial for readers who are non-experts in statistical modelling.

Response 5: Thank you for this important observation. We agree that the metric IncMSE % (Increase in Mean Squared Error percentage) should be clearly defined to ensure accessibility for readers who may not be familiar with variable importance measures in random forest models.

In response, we have added a concise explanation in the revised manuscript (Section 2.5, lines 265–276):

Variable importance was assessed using the IncMSE (%) metric from the random forest algorithm, which quantifies the percentage increase in model prediction error (mean squared error, MSE) when the values of a given predictor variable are randomly permuted across out-of-bag samples. A higher IncMSE % indicates greater contribution of that variable to model performance.”

This clarification helps non-specialist readers understand both the conceptual basis and practical interpretation of IncMSE % in the context of our analysis. Thank you for prompting us to improve the transparency and inclusivity of our methodology description.

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

This manuscript presents a UAV-based high-throughput framework for evaluating drought tolerance in upland cotton by integrating multispectral imagery and machine-learning models. The topic is relevant to Agronomy, and the use of UAV phenotyping across a large number of accessions is a clear strength. However, several important issues should be addressed before the manuscript can be considered for publication.

1- Definition of drought treatment: The drought stress (DS) treatment is insufficiently defined. The manuscript describes “two periods of water deficit” but does not quantify drought severity, irrigation amounts, duration, or soil/plant water status. Without this information, reproducibility and interpretation of drought tolerance are weakened. Please clearly report timing, duration, irrigation levels for CK and DS, and any supporting drought intensity measurements. 

2- Experimental design and replication: The study reports a CRBD with two replicates for each treatment. Given the large number of accessions, the manuscript should more clearly describe the blocking structure, randomization, and how spatial heterogeneity was controlled. The adequacy of replication for genotype ranking under field drought should be justified.

3- Construction of the drought tolerance index (D): The construction of the target variable (D) from drought resistance coefficients and PCA is central to the study but not sufficiently explained. Please clarify data standardization, the rationale for PC selection, and how D is computed at the accession level. In addition, explicitly describe which UAV imagery (CK, DS, or both) is used as model input and explain why this does not introduce circularity or label leakage. 

4- Model terminology and evaluation clarity: Logistic regression is listed as a regression model, which is technically incorrect unless classification was performed. Please correct terminology if linear regression was used. Additionally, PEarson correlation (R) and coefficient of determination (R²) should be clearly distinguished and reported consistently.

5- Feature selection and modeling procedure: The feature selection process based on Random Forest importance (%IncMSE) requires clearer description, including how importance was computed, whether it was restricted to training data, and how thresholds were chosen. Claims related to predicting future drought scenarios should be moderated unless supported by a temporal forecasting design.

6- Statistical analysis of trait differences: The manuscript reports significance levels for multiple traits, but the statistical model is not clearly described. Please specify the statistical tests used, the treatment of genotype and environment effects, and whether multiple trait tests were handled in a controlled way (e.g., clarify any adjustment strategy, if used).

Addressing these points will substantially improve the clarity, rigor, and reproducibility of the study.

Comments on the Quality of English Language

The English is generally understandable, but several sections contain awkward phrasing and technical terminology inaccuracies. Language polishing is recommended to improve clarity and precision. 

Author Response

Comments 1: Definition of drought treatment: The drought stress (DS) treatment is insufficiently defined. The manuscript describes “two periods of water deficit” but does not quantify drought severity, irrigation amounts, duration, or soil/plant water status. Without this information, reproducibility and interpretation of drought tolerance are weakened. Please clearly report timing, duration, irrigation levels for CK and DS, and any supporting drought intensity measurements.

Response 1: Thank you very much for this critical and constructive comment. We agree that a precise and quantitative description of the drought treatment is essential for both reproducibility and meaningful interpretation of drought tolerance.

In response, we have substantially revised the manuscript (Section 2.1 ,lines 189206) to provide a clear and detailed account of the drought stress protocol. The experiment was conducted using a randomized complete block design with subsurface drip irrigation under plastic mulch. The well-watered control (CK) received regular irrigation every 10 days, with each event delivering 36 m³ of water per mu (540 m³ ha⁻¹). In contrast, the drought-stressed (DS) treatment underwent two consecutive water-deficit periods during the flowering and boll-setting stage, totaling 20 days without irrigation.

Crucially, soil percent moisture content was measured using the five-point sampling method three days before the end of the drought period, soil percent moisture content average values of 38.65% for CK and 23.22% for DS, confirming a significant moisture gradient between treatments. High-throughput UAV-based phenotyping was also conducted at this time point to capture canopy responses under active drought stress.

These additions now fully specify the timing, duration, irrigation regime, and quantitative soil moisture data necessary to assess drought severity and enable replication. We appreciate your suggestion, which has greatly strengthened the methodological rigor of our study.

Comments 2: Experimental design and replication: The study reports a CRBD with two replicates for each treatment. Given the large number of accessions, the manuscript should more clearly describe the blocking structure, randomization, and how spatial heterogeneity was controlled. The adequacy of replication for genotype ranking under field drought should be justified.

Response 2: Thank you very much for raising this important point regarding experimental design and replication. We appreciate your attention to the rigor required for reliable genotype evaluation under field drought conditions.

In response, we have clarified the experimental setup in Section 2.1. The trial was implemented as a randomized complete block design with two biological replicates per treatment, where each block represented a spatially contiguous unit to account for potential field heterogeneity (e.g., soil texture or moisture gradients). Within each block, all cotton accessions were randomly assigned to plots under either well-watered (CK) or drought-stressed (DS) conditions. This layout—combined with uniform agronomic management and subsurface drip irrigation under plastic mulch—helped minimize uncontrolled spatial variability.

We acknowledge that two replicates are at the lower end of typical recommendations for multi-genotype field trials. However, given the large number of accessions and the logistical constraints of imposing precise, synchronized drought stress across a large area, we prioritized plot uniformity, strict irrigation control (Total water control volume: 72 m³/ mu over the season; DS received no irrigation during two 10-day deficit periods totaling 20 days), and high-frequency phenotyping to enhance data reliability. Moreover, the significant and consistent separation between CK and DS treatments across 15 physiological and yield-related traits supports the effectiveness of our design in capturing drought responses.

That said, we agree that increasing replication would further improve the precision of genotype rankings, and we now explicitly note this as a consideration for future large-scale validation studies.

Thank you again for prompting us to better articulate the rationale and limitations of our experimental design.

Comments 3: Construction of the drought tolerance index (D): The construction of the target variable (D) from drought resistance coefficients and PCA is central to the study but not sufficiently explained. Please clarify data standardization, the rationale for PC selection, and how D is computed at the accession level. In addition, explicitly describe which UAV imagery (CK, DS, or both) is used as model input and explain why this does not introduce circularity or label leakage.

Response 3: Thank you for this thoughtful and technically important comment.

We have revised the manuscript to provide a clearer and more detailed description of the PCA procedure and the calculation of the comprehensive drought tolerance index (D). Specifically, we now explicitly outline the steps of data standardization, principal component selection (based on eigenvalues > 1 and variance explained), and the weighted integration of retained PCs into the final D score at the individual accession level.

In addition, we have added a supporting reference—Genome-wide association and differential expression analysis of salt tolerance in Gossypium hirsutum L. at the germination stage—which employs a similar PCA-based approach for constructing composite stress tolerance indices in cotton, thereby providing methodological validation for our framework.

Regarding model input and potential leakage:  

- The input features for the machine learning models (XGBoost, RF, etc.) are vegetation indices (VIs) extracted exclusively from UAV imagery acquired under the DS treatment (i.e., during active drought stress).  

- The target variable D is derived solely from ground-measured traits (not from UAV data) and reflects an integrated, post-hoc assessment of drought tolerance based on performance differences between CK and DS.  

- Critically, training and testing datasets were partitioned at the accession level, ensuring no overlap between samples used to compute D and those used for model validation. Thus, there is no circularity or label leakage, as the model learns to predict D from independent remote sensing observations.

We hope these clarifications enhance the transparency and reproducibility of our phenotyping pipeline. Thank you again for your insightful feedback.

Comments 4: Model terminology and evaluation clarity: Logistic regression is listed as a regression model, which is technically incorrect unless classification was performed. Please correct terminology if linear regression was used. Additionally, PEarson correlation (R) and coefficient of determination (R²) should be clearly distinguished and reported consistently.

Response 4: Thank you for your insightful comment regarding model terminology and evaluation clarity. We sincerely appreciate the opportunity to clarify and have revised the manuscript accordingly. Below is our detailed response:

In this study, we developed and compared four regression models: Linear Regression (LR), k-Nearest Neighbors (KNN), Light Gradient Boosting Machine (LGBM), and XGBoost (Extreme Gradient Boosting). We would like to explicitly clarify that logistic regression was not used in our analysis. The mention of “logistic regression” in the original manuscript was an inadvertent typographical error. Since our task involves predicting a continuous outcome variable, all models employed are regression models, not classification models. This error has been corrected throughout the revised manuscript.

Regarding model evaluation metrics, we adopted distinct but complementary approaches depending on the analytical context:

Overall Model Performance Evaluation:

For assessing and comparing the predictive performance of the four models, we consistently used the coefficient of determination (R²) as the primary metric. R² quantifies the proportion of variance in the target variable explained by the model and is a standard, interpretable measure for regression tasks.

We greatly appreciate your careful review and constructive feedback, which has significantly enhanced the methodological rigor and clarity of our manuscript.

 

Comments 5: Feature selection and modeling procedure: The feature selection process based on Random Forest importance (%IncMSE) requires clearer description, including how importance was computed, whether it was restricted to training data, and how thresholds were chosen. Claims related to predicting future drought scenarios should be moderated unless supported by a temporal forecasting design.

Response 5: Thank you for your insightful comment on feature selection and modeling procedure. We fully acknowledge that a detailed description of the %IncMSE-based feature selection process is essential for methodological reproducibility, and that the claim about future drought scenario prediction needs to be pruned to match the actual experimental design.

To address this, we have made the following revisions to the manuscript:

For the feature selection process, we have added a detailed description in Section 2.5 , including: the specific calculation method of %IncMSE (i.e., permuting predictor variables in out-of-bag samples of the training set only to compute the percentage increase in MSE); the confirmation that %IncMSE was calculated exclusively on the training dataset to avoid data leakage; and the specific selection criterion for the threshold (i.e., selecting vegetation indices with %IncMSE values in the top 25% and combining with the correlation analysis with D value to determine the final optimal feature set).

For the claim about drought prediction, we have moderated the relevant statements in the Results and Discussion sections, revising the overstated expression of "predicting future drought scenarios" to a more prudent description that aligns with our experimental design—"predicting cotton drought stress status at the flowering and boll-setting stage based on UAV multispectral data"—and removed any irrelevant claims about future temporal forecasting that lack experimental support.

We have also conducted a full check of the entire modeling process description to ensure the logical consistency and methodological transparency of the text. Thank you again for your constructive suggestion, which has significantly improved the rigor and accuracy of our research presentation.

 

Comments 6: Statistical analysis of trait differences: The manuscript reports significance levels for multiple traits, but the statistical model is not clearly described. Please specify the statistical tests used, the treatment of genotype and environment effects, and whether multiple trait tests were handled in a controlled way (e.g., clarify any adjustment strategy, if used).

Response 6: Thank you very much for this helpful suggestion. We appreciate your emphasis on statistical transparency, especially to ensure accessibility for readers who may be less familiar with analytical methods.

In response, we have clarified the statistical approach in the revised manuscript (Section 3.1). Specifically, to assess differences in each of the 15 measured traits between the well-watered control (CK) and drought-stressed (DS) treatments, we performed independent two-sample t-tests (assuming unequal variances where appropriate, as verified by Levene’s test). These tests were conducted at the accession-aggregated level, comparing the mean trait values under CK versus DS across all genotypes.

Author Response File: Author Response.docx

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript has been substantially improved and now provides a clear and coherent framework for UAV-based drought tolerance evaluation in cotton. The methodology, model description, and interpretation of results are adequately presented, and the conclusions are supported by the data. I recommend acceptance after the following minor clarifications:

1- Soil Moisture Measurement Method: please briefly clarify whether soil moisture (38.65% CK vs 23.22% DS, 0–60 cm) was measured gravimetrically or volumetrically, and indicate the measurement method or instrument used.

2- Statistical Test Description for Table 3: please specify the exact statistical test applied (e.g., independent or paired t-test) and clarify the unit of replication used in the analysis.

Author Response

We sincerely appreciate the reviewers’ valuable comments and suggestions, which have helped us further improve the quality and rigor of our manuscript. We have carefully addressed all the minor clarifications proposed, and the detailed responses are as follows:

  1. Soil Moisture Measurement Method: please briefly clarify whether soil moisture (38.65% CK vs 23.22% DS, 0–60 cm) was measured gravimetrically or volumetrically, and indicate the measurement method or instrument used.

Response to Comment 1 :Thank you so much for your kind words. The soil moisture content (38.65% for CK vs 23.22% for DS, 0–60 cm soil layer) was measured using the gravimetric method. Specifically, 0–60 cm soil samples were collected, with three replicates set for the measurement. The detailed operation steps are as follows: first, an aluminum box was placed in an oven at 105 °C ± 5 °C for drying, and its weight was measured and recorded as m0 after drying. Then, approximately 10 g of fresh soil was placed into the aluminum box, and the total weight of the aluminum box and fresh soil was measured and recorded as m1. Subsequently, the aluminum box with soil was put back into the oven at 105 °C ± 5 °C and dried until its weight remained constant; the total weight at this time was taken out and recorded as m2. The soil moisture content (%) was calculated according to the following formula: Moisture = [(m1 - m2)/(m1 - m0)] × 100.

 

  1. Statistical Test Description for Table 3: please specify the exact statistical test applied (e.g., independent or paired t-test) and clarify the unit of replication used in the analysis.

 

Response to Comment 2Thank you very much for this critical and constructive comment.For the statistical analysis of the data presented in Table 3, a paired samples t-test was applied to compare the significant differences between the CK (control group) and DS (drought stress group). The unit of replication used in the analysis was biological replicates, with three independent experimental plots assigned to each treatment (CK and DS), and all measurements within each plot were averaged to form one replicate for statistical testing. The significance level was set at P < 0.05 for all statistical analyses.

All modifications have been made in the revised manuscript accordingly. We would like to thank the reviewers again for their careful review and constructive comments.

Author Response File: Author Response.pdf

Back to TopTop