Using Optimized Three-Band Spectral Indices and a Machine Learning Model to Assess Squash Characteristics under Moisture and Potassium Deficiency Stress
Round 1
Reviewer 1 Report
The topic is relevant and interesting.
However, the report of the methods and results should be improved.
Here are my questions (Q) and comments (C):
Q1: Md, Mln, Ms were used in three different layout. Did you tune the model for getting the optimal parameter vector(s)? If yes, how? If now, why did you use these parameters?
Q2: As far as I could understand, SY, Chlm, WUE and KUE are measuered as related variables. Why did you not used one/two-way MANOVA?
Q3.: Please, report the optimal parameters together with feature importance values.
C1.: Some typing errors should be searched and corrected (unnecessary space, hours denoted by hr, instead of h etc.)
C2.: Tables 3 and 8 should be put to the supplement
C3.: FIG 2: The R2 colour range of the R2 values should be unified to be able to compare the 4 plots.
C4.: The formulas of Chlm, WUE and KUE should be shifted before Fig2.
C5.: Please, clarify, when did you use one- and when two-way ANOVA. In Statistical analysis, please, provide us with the factor(s), factor levels and dependent variables.
C6.: Please, report the methods and the outcomes of the homogeneity of variance assumption tests. If homogeneity of variances is violated, Duncan post hoc test is not appropriate. In this case, use Games-Howell’s post hoc test.
C7.: Statistical analysis should not contain results (l 298: “As a result…”).
C8.: Please, clarify, when did you use basic and multivariate linear regression (dependent, explaining variable(s)). Also, report the normality assumption check of the residuals terms. Prove the multicollinearity assumption.
C9.: In Results, not only p values, but also test values and degrees of freedom are required to be given for all factor and interaction effects.
C10.: In Table 4, and Table 5, please provide us with standard deviations. Instead of “Effect of …”, or “The influence of…”, start the caption of Table 4 with “Means and standard deviations of…” Complete the caption with all the necessary information in order to make the table understandable in itself.
C11.: Delete the spaces before the decimal dot throughout the MS.
C12.: Under the table 6, please, add “at p<0.05” (if it was the level you decided at).
C13.: The results of PCA is scarce: please, give the explained variance rate and the rotated component coefficients (loadings). Circle the relevant red dots with high watering regime that are close to each other. Give an explanation with showing the relation of the loadings of the first/ second components and the place of the dots in the plots.
C14.: Please, report the DT model feature importance values, model accuracy and OOB error rate.
C15.: In Table 7, I can see that the validation R2 values are notably lower that in the training set. How can you prove that it is not a sign of overfitting in the training set? Can you improve this rate with tuning again the models?
Author Response
Response to Reviewer 1
The topic is relevant and interesting.
However, the report of the methods and results should be improved.
Here are my questions (Q) and comments (C):
We greatly appreciate your critical observations as well as your constructive and helpful comments. We hope that we could address your questions/comments by the explanations and revisions made in the manuscript. We believe that the manuscript is substantially improved after making the suggested revisions.
1- Q1: Md, Mln, Ms were used in three different layout. Did you tune the model for getting the optimal parameter vector(s)? If yes, how? If now, why did you use these parameters?
Response: Thank you for your guidance and support for our manuscript. As a result of their frequent usage in research [73], maximum depth (Md), maximum leaf nodes (Mln), and minimum sample leaf (S) were taken into consideration during training. Also, we added some information on how to identify the best parameters in lines (266-275).
[73] Galal, H.; Elsayed, S.; Elsherbiny, O.; Allam, A.; Farouk, M. Using RGB Imaging, Optimized Three-Band Spectral Indices, and a Decision Tree Model to Assess Orange Fruit Quality. Agriculture 2022, 12, 1558.
Q2: As far as I could understand, SY, Chlm, WUE and KUE are measuered as related variables. Why you did not used one/two-way MANOVA?
RE: MANOVA is an extension of ANOVA, which measures the impact of independent categorical variables upon numerous dependent continuous variables. It is a process used for comparing the sample means, which are multivariate in statistics. Manova is more complex, it is a non-parametric test. In our case ANOVA test was more appropriate. However, MANOVA does not have an advantage and only complicates the tests. Also, the outcome given by it is sometimes confusing.
Q3.: Please, report the optimal parameters together with feature importance values.
Response: Thank you for your valuable suggestions. To answer this question, you can check Tables (7-8). Table 7 explains results of a decision tree model based on different features extracted from hyperspectral data. Also, this table has included the best parameters during the training with the most significant spectral characteristics as clarified in Table (8). Since the number of spectral features is many, we created table 8 to explain it alone.
C1.: Some typing errors should be searched and corrected (unnecessary space, hours denoted by hr, instead of h etc.)
Response: Thank you for your valuable suggestions. They were corrected.
C2.: Tables 3 and 8 should be put to the supplement
Response: Thank you for your valuable suggestions. We put the table 8 in the supplement but table 3 is important to introduce all information about the spectral indices.
C3.: FIG 2: The R2 colour range of the R2 values should be unified to be able to compare the 4 plots.
Response: We're sorry that we didn't explain this figure very well. So, we changed this figure to another that contains a uniform color bar with four plots to help make comparison between them.
C4.: The formulas of Chlm, WUE and KUE should be shifted before Fig2.
Response: Many thanks for this comment. The formulas of Chlm, WUE and KUE were shifted before Fig 2.
C5.: Please, clarify, when did you use one- and when two-way ANOVA. In Statistical analysis, please, provide us with the factor(s), factor levels and dependent variables.
Response: We're sorry for this error. One-way ANOVA is not used in this study. We are used two factors (irrigation regime and potassium fertilizer) with three levels for each factor in split plot design. The section 2.11 statistical analysis was corrected.
C6.: Please, report the methods and the outcomes of the homogeneity of variance assumption tests. If homogeneity of variances is violated, Duncan post hoc test is not appropriate. In this case, use Games-Howell’s post hoc test.
Response: Many thanks for this comment. Combined analysis of variance across the two seasons was performed after performing the homogeneity test. We added the data details of statistical analysis in Tables S1, S2a, S2b and S2c in supplementary materials file to be clear for the reader.
C7.: Statistical analysis should not contain results (l 298: “As a result…”).
Response: Many thanks for this comment. It was removed.
C8.: Please, clarify, when did you use basic and multivariate linear regression (dependent, explaining variable(s)). Also, report the normality assumption check of the residuals terms. Prove the multicollinearity assumption.
Response: We're sorry for this error. Only the simple regressions were used to calculate the association between the SRIs and the assessed attributes of squash. The multivariate linear regression was not used. The correction of this error was corrected under section 2.11 statistical analysis.
C9.: In Results, not only p values, but also test values and degrees of freedom are required to be given for all factor and interaction effects.
Response: Many thanks for this comment. We added tables of ANOVA analysis in supplementary materials file which include all data. Statistical analysis including analysis of variance (degrees of freedom (df), F-values, and significance level) of the effect of year, irrigation level, potassium level, and their interaction on seed yield(SY), chlorophyll meter (Chlm), water use efficiency (WUE), potassium use efficiency (KUE) and spectral indices of squash.
C10.: In Table 4, and Table 5, please provide us with standard deviations. Instead of “Effect of …”, or “The influence of…”, start the caption of Table 4 with “Means and standard deviations of…” Complete the caption with all the necessary information in order to make the table understandable in itself.
Response: Many thanks for this comment. The standard deviations were added in Tables 4 and 5 and the caption of Table 4 and 5 was modified.
C11.: Delete the spaces before the decimal dot throughout the MS.
Response: Many thanks for this comment. It was modified
C12.: Under the table 6, please, add “at p<0.05” (if it was the level you decided at).
Response: Many thanks for this comment. It was added.
C13.: The results of PCA is scarce: please, give the explained variance rate and the rotated component coefficients (loadings). Circle the relevant red dots with high watering regime that are close to each other. Give an explanation with showing the relation of the loadings of the first/ second components and the place of the dots in the plots.
Response: Thank you for your comments. To clarify this comment, we have included some crucial information between lines 463 and 468. As well new figures 3a and 4b were added.
C14.: Please, report the DT model feature importance values, model accuracy and OOB error rate.
Response: Thank you for your comments. Table 7 summarizes the model accuracy outcomes after training and testing. Table 8 further describes the ordering of the most relevant spectral features based on feature significance ratings. We have followed your suggestions to present the paper clearly. Although the OOB error rate with some models such as the random forest model can be calculated, the Scikit-Learn package with the decision tree model does not provide an attribute to determine the OOB error rate. We have run this code with the decision tree model after training and we got this message “AttributeError: 'DecisionTreeRegressor' object has no attribute 'oob_score_'”. Also, we searched to get any solution to calculate it but we discovered that it can be calculated through specific models such as random forest only and not available with the decision tree.
C15.: In Table 7, I can see that the validation R2 values are notably lower that in the training set. How can you prove that it is not a sign of overfitting in the training set? Can you improve this rate with tuning again the models? Can you improve this rate with tuning again the models?
Response: Thank you, your feedback helps us improve our manuscript. To filter the highest variables, the DT model was applied with different spectral features such as 3D-spectral indices (3D-SRIs), DT-based bands (DT-b), and the aggregate of all spectral characteristics (ASF), as shown in Table 7 in lines (562-564). As a consequence, we can observe that certain characteristics had bad performance outcomes, while others had good results. Also, we chose high-level characteristics that perform well throughout training and validation.
This study already has a unique framework based on a decision tree model and hyperparameter optimization. Furthermore, to the best of our knowledge, no prior study has used this decision tree technique to accurately estimate and monitor squash properties under moisture and potassium deficit stress. In the future, we can try to apply other machine learning algorithms to develop this study.
Author Response File: Author Response.docx
Reviewer 2 Report
The authors indicate in the "Statistical analysis" section that they used analysis of variance with one and two factors. However, nothing is mentioned about fulfilling the requirements inherent to these methods. Therefore, this information must be added.
Regarding the 2-way analysis of variance, it must be indicated whether it is with or without replicates.
Authors should write the name of the method in full when referring to it for the first time, and not the abbreviation (as they have for principal component analysis). On the other hand, it must be indicated which standardization was made to the data, prior to this multivariate analysis.
Still in this section, information about the software used should never be the first information, but something that should come at the end. On the other hand, there is no information about the significance level used, which is mandatory in any scientific work.
What is the difference between uppercase and lowercase letters in table 4?
The authors state that: “In both spring and fall seasons, both stressors (moisture and potassium fertilization) had a significantly impact on squash seed yield (p < 0.005). In both investigated seasons, the interaction between moisture and potassium demonstrated significant effects on total squash seed yield.”. However, how is it possible to identify these results in that table? For example, where is the reflected interaction? For example, where is the p-value obtained?
In the results, when “The impacts of varying moisture and potassium rates on squash seed yield were quantified using the analysis of variance (ANOVA) as summarised in Table 4”, are the results referring to the ANOVA with one or two factors? It is not clear in the text, nor in the legend of the table.
Why are standard deviation values not displayed? What is the dimension of the replicas?
When the authors indicate: “According to the findings, all SRIs except for those with NWI-1, NWI-3, and NWI-4 demonstrated statistically significant differences among the three irrigation rates and potassium for squash. There were significant differences in newly three-band and published SRI values at various potassium fertilizer and irrigation levels, which may have been caused by wide variations in measured parameter values.”, where can we check these results? This indication is missing.
Authors cannot write results with different formatting. In the text, there is both "p<..." and "P≤..."
When referring: “The impacts of varying moisture and potassium rates on squash seed yield were quantified using the analysis of variance (ANOVA) as summarised in Table 4. In both spring and fall seasons, both stressors (moisture and potassium fertilization) had a significantly impact on squash seed yield (p < 0.005).”, refer to the p-value between parentheses, but when they refer: “According to the findings, all SRIs except for those with NWI-1, NWI-3, and NWI-4 demonstrated statistically significant differences among the three irrigation rates and potassium for squash. There were significant differences in newly three-band and published SRI values at various potassium fertilizer and irrigation levels, which may have been caused by wide variations in measured parameter values.”, there is no mention of the p-value. The text must be a uniform document!
In tables 4, 5 and 6 the p-values must be included.
The authors write that: “The figures demonstrate that the PCA has the ability to spot dissimilarities between non-stressed plants and those suffering from moisture and potassium deficiency stress.”. However, it is necessary to consider that the figures are not methods, but a way of presenting results, so this sentence should be revised.
In the principal component analysis, it is necessary to include the values of the explained variance, as well as the justification for interpreting only the first factorial plan.
The grammar and sentence construction of the entire document must be reviewed in detail.
Author Response
Response to Reviewer 2
The authors indicate in the "Statistical analysis" section that they used analysis of variance with one and two factors. However, nothing is mentioned about fulfilling the requirements inherent to these methods. Therefore, this information must be added.
Response: We're sorry for the error. One-way ANOVA is not used in this study. We are used two factors (irrigation regime and potassium fertilizer) with three levels for each factor in split plot design and three replicates. The section 2.11 statistical analysis was corrected.
- Regarding the 2-way analysis of variance, it must be indicated whether it is with or without replicates.
Response: Many thanks for this comment. We are used two factors (irrigation regime and potassium fertilizer) with three levels for each factor in split plot design and three replicates.
- Authors should write the name of the method in full when referring to it for the first time, and not the abbreviation (as they have for principal component analysis). On the other hand, it must be indicated which standardization was made to the data, prior to this multivariate analysis.
Response: Thank you very much for your positive comments. The name of the method in full when referring to it for the first time was written. As well as, we have added some information about standardization in lines 279-283 under section 2.9. Datasets and Software for Data Analysis.
- Still in this section, information about the software used should never be the first information, but something that should come at the end. On the other hand, there is no information about the significance level used, which is mandatory in any scientific work.
Response: Thank you for your valuable suggestions and sorry for the error. The section of statistical analysis was totally corrected and all comments of the reviewer were added.
- What is the difference between uppercase and lowercase letters in table 4?
Response: Thank you for your valuable suggestions. Uppercase letters is referred to the significant between the mean values of irrigation regime levels and lowercase letters is referred to the significant between the mean values of potassium fertilizer levels. This information was added under the table 4.
- The authors state that: “In both spring and fall seasons, both stressors (moisture and potassium fertilization) had a significantly impact on squash seed yield (p < 0.005). In both investigated seasons, the interaction between moisture and potassium demonstrated significant effects on total squash seed yield.”. However, how is it possible to identify these results in that table? For example, where is the reflected interaction? For example, where is the p-value obtained?
Response: Many thanks for this comment. We added Table S1 of ANOVA analysis in supplementary materials file which include all information data. Statistical analysis including analysis of variance (degrees of freedom (df), F-values, and significance level) of the effect of year, irrigation level, potassium level, and their interaction on seed yield(SY), chlorophyll meter (Chlm), water use efficiency (WUE), and potassium use efficiency (KUE) of squash. As well as table 4 was improved by adding the different letters to indicate significantly different at P ≤ 0.05 between treatment and interaction.
- In the results, when “The impacts of varying moisture and potassium rates on squash seed yield were quantified using the analysis of variance (ANOVA) as summarised in Table 4”, are the results referring to the ANOVA with one or two factors? It is not clear in the text, nor in the legend of the table.
Response: We're sorry for the error. One-way ANOVA is not used in this study. We are used two factors (irrigation regime and potassium fertilizer) with three levels for each factor in split plot design. The section 2.11 statistical analysis was corrected.
- Why are standard deviation values not displayed? What is the dimension of the replicas?
Response: Many thanks for this comment. Standard deviation was added to the tables 4 and 5. The three replication of each treatment was used.
- When the authors indicate: “According to the findings, all SRIs except for those with NWI-1, NWI-3, and NWI-4 demonstrated statistically significant differences among the three irrigation rates and potassium for squash. There were significant differences in newly three-band and published SRI values at various potassium fertilizer and irrigation levels, which may have been caused by wide variations in measured parameter values.”, where can we check these results? This indication is missing.
Response: Many thanks for this comment. The significant differences of newly three-band and published SRI values at various potassium fertilizer and irrigation levels can be check in Table 5 and Table S2a, S2b and S2c as well as wide variations in measured parameter values can be checked in Table 4.We added Table S2a, S2b and S2c of ANOVA analysis in supplementary materials file which include all information data. Statistical analysis including analysis of variance (degrees of freedom (df), F-values, and significance level) of the effect of year, irrigation level, potassium level, and their interaction on spectral indices of squash.
- Authors cannot write results with different formatting. In the text, there is both "p<..." and "P≤..."
Response: Many thanks for this comment. It was fixed in the text
- When referring: “The impacts of varying moisture and potassium rates on squash seed yield were quantified using the analysis of variance (ANOVA) as summarised in Table 4. In both spring and fall seasons, both stressors (moisture and potassium fertilization) had a significantly impact on squash seed yield (p < 0.005).”, refer to the p-value between parentheses, but when they refer: “According to the findings, all SRIs except for those with NWI-1, NWI-3, and NWI-4 demonstrated statistically significant differences among the three irrigation rates and potassium for squash. There were significant differences in newly three-band and published SRI values at various potassium fertilizer and irrigation levels, which may have been caused by wide variations in measured parameter values.”, there is no mention of the p-value. The text must be a uniform document! In tables 4, 5 and 6 the p-values must be included.
Response: Many thanks for this comment. The text was uniformed and p-values was include under the tables 4, 5 and 6.
- The authors write that: “The figures demonstrate that the PCA has the ability to spot dissimilarities between non-stressed plants and those suffering from moisture and potassium deficiency stress.”. However, it is necessary to consider that the figures are not methods, but a way of presenting results, so this sentence should be revised.
Response: Thank you for your comments. The sentence was revised.
- In the principal component analysis, it is necessary to include the values of the explained variance, as well as the justification for interpreting only the first factorial plan.
Response: Thank you for your comments. We have added some important information in the 463-468 lines to explain this comment. As well new figures 3a and 4b were added.
- The grammar and sentence construction of the entire document must be reviewed in detail.
Response: Many thanks for this comment. The grammar and sentence construction was corrected.
Author Response File: Author Response.docx
Reviewer 3 Report
The paper “Using Optimized Three-Band Spectral Indices, and machine 2 learning Model to Assess Squash Characteristics under Mois-3 ture and Potassium Deficiency Stress” by
Mohamed A. Sharaf-Eldin et. al. is devoted to the estimation of the effects of irrigation treatments and potassium fertilization on some traits of squash: potassium use efficiency, water use efficiency, chlorophyll meter and seed yield
Apparently, the authors managed to collect a large amount of data and get very interesting material regarding endence of chlorophyll meter on irrigation and the relation between potassium use efficiency and water use efficiency.
However, the presentation of the results in the article is unsatisfactory. Tables 4.5 are hard for understanding. Why table 6 is given is not at all clear. What's the point of presenting results that a method doesn't work? It is necessary to leave only the results that gave satisfactory value of R2 and conclusions. Maybe it is better to replace tables with graphs for clarity.
What Figures 3 and 4 reflect is not clear at all
It is also not clear what the authors achieved by using the DT mole. What exactly was “better predicted” with its help?
To my opinion the object of research is of interest and corresponds to the theme of the Horticulture journal, but the results of the article should be substantially revised and presented in a form that can be analyzed by the reader.
Author Response
Response to Reviewer 3
The paper “Using Optimized Three-Band Spectral Indices, and machine 2 learning Model to Assess Squash Characteristics under Mois-3 ture and Potassium Deficiency Stress” by
Mohamed A. Sharaf-Eldin et. al. is devoted to the estimation of the effects of irrigation treatments and potassium fertilization on some traits of squash: potassium use efficiency, water use efficiency, chlorophyll meter and seed yield. Apparently, the authors managed to collect a large amount of data and get very interesting material regarding endence of chlorophyll meter on irrigation and the relation between potassium use efficiency and water use efficiency.
We greatly appreciate your critical observations as well as your constructive and helpful comments. We hope that we could address your questions/comments by the explanations and revisions made in the manuscript. We believe that the manuscript is substantially improved after making the suggested revisions.
- However, the presentation of the results in the article is unsatisfactory. Tables 4.5 are hard for understanding.
Response: Many thanks for this comment. In general, the presentation of the results was modified as well as table 5 and 6 was improved from the title of the tables until data presented. As well as we added the tables of ANOVA analysis in supplementary materials file which include all data to be clear for the readers.
- Why table 6 is given is not at all clear. What's the point of presenting results that a method doesn't work? It is necessary to leave only the results that gave satisfactory value of R2 and conclusions. Maybe it is better to replace tables with graphs for clarity.
Response: Thank you for your valuable suggestions. The idea for presented several indices to compare the published indices and newly three band indices to estimate four measured parameters of Squash in each season and both season. Each measured parameter of squash is related to some indices which presented in Table 6. For that we presented several indices. In this work, we assessed how several SRIs, that combine different bands from the spectrum regions of VIS, red-edge, and NIR, responded to various irrigation schedules and potassium fertilization. Since Changes in leaf and plant properties such as internal leaf structure, leaf pigments, and biomass are connected to the indirect impacts and have a large impact on the spectral signature in the visible and near-infrared ranges. This information was mentioned in the manuscript from lines 425 – 434. Any way only the best indices were presented in the text of the manuscript.
One of the reviewers recommends adding the p value in the table 6. I have already drawn the figure but in fact may be it is better to present the data in table to be clear as numbers for the readers.
- What Figures 3 and 4 reflect is not clear at all
Response: Thank you for your valuable suggestions. Many information data was added under the section 3.4 (Defferntiating Moisture and Potassium Deficiency Spectrally) to be clear for the readers. As well as Figure 3 b and Figure 4b were presented to support this comment
- It is also not clear what the authors achieved by using the DT mole. What exactly was “better predicted” with its help?
Response: Thank you for your comment. Some information was mentioned in lines (553-562) the reason of using the DT model based on spectral data than using individual spectral index. As well as some information was mentioned in lines 586-591to explains the outcomes of the proposed model “The model's prediction accuracy is affected by the value and number of features. Table 8 shows a variety of choices for merging features and models that have the greatest influence on the prediction of quality attributes in squash crops. This table explains that there are unique characteristics of training models that have the lowest RMSEV value and perform well in predicting. Depending on the model used, the RMSEV value dropped with these specified features.”
There is some information that helps readers to understand how to improve the proposed model during training in the lines (285-288)
- To my opinion the object of research is of interest and corresponds to the theme of the Horticulture journal, but the results of the article should be substantially revised and presented in a form that can be analyzed by the reader.
Response: Thank you for your comment. The results of the manuscript were revised.
Author Response File: Author Response.docx
Round 2
Reviewer 2 Report
The responses were all performed in accordance with what was requested. I only suggest a detailed review with regard to sentence and grammatical construction.