Next Article in Journal
Earth-Observation-Based Services for National Reporting of the Sustainable Development Goal Indicators—Three Showcases in Bulgaria
Next Article in Special Issue
Estimating Community-Level Plant Functional Traits in a Species-Rich Alpine Meadow Using UAV Image Spectroscopy
Previous Article in Journal
MIMO FMCW Radar with Doppler-Insensitive Polyphase Codes
 
 
Article
Peer-Review Record

Machine Learning in the Analysis of Multispectral Reads in Maize Canopies Responding to Increased Temperatures and Water Deficit

Remote Sens. 2022, 14(11), 2596; https://doi.org/10.3390/rs14112596
by Josip Spišić 1, Domagoj Šimić 2, Josip Balen 1, Antun Jambrović 2 and Vlatko Galić 2,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Remote Sens. 2022, 14(11), 2596; https://doi.org/10.3390/rs14112596
Submission received: 29 April 2022 / Revised: 19 May 2022 / Accepted: 26 May 2022 / Published: 28 May 2022
(This article belongs to the Special Issue Remote Sensing of Vegetation Function and Traits)

Round 1

Reviewer 1 Report

This study used multispectral information to drive different machine learning models to predict the occurrence of leaf rolling for maize. The authors showed the promise of using multispectral sensing in detecting drought stress in plants. The paper reads well and the results were shown in a simple and straightforward way. Some sections are not clear and need further details, especially material and methods, and discussion. See below for my suggestions.

L13 multispectral arrays?

L29-30 It reads like human population growth results in climate change, references are needed in this sentence.

L41-67 This paragraph is too long and the major points that you want to show are very hard to follow. Maybe separate it into two parts, especially for the introduction of leaf rolling in plants.

L79-94 it’s very hard to follow these two paragraphs. Can you specify some examples of the use of ML for detecting drought stress? Can you please show the pros and cons of using ML in predicting stress types? Did you do dimension reduction in this study?

Table 1 what do you mean by “FAO maturity”? What unit is used for “FAO maturity”? The unit for “Anthesis interval” is missing.

Figure 1 Is that daily mean temperature or daily diurnal temperature? The unit of rainfall is mm per day?

L174 …are available as shown in Figure S1.

L234 both?

Figure 4 How many data points were used in this figure? 15? Suggest using scatter with straight lines and marker plots to show these different VIs. What is A.U.?

Table 3 I strongly suggest repeating the random subset data process 5 or 10 times to show the average of each statistical index (sd can be shown in bracket). How did you randomly select 15% data for validation? Did you set a random seed?

L334 delete able

L338-346 these should belong to the Introduction section

L372 VIs

L394 SLP

L409 in studies or in a study?

L408-411 not clear, please rewrite this sentence

Discussion: this section is too general and the depth of discussion is not enough. What is the limitation of your study? What are the implications of your results for model development? Can some satellite data be used as inputs of your model as proxy variables of VIs used here? Have you considered increasing the interpretation of these ML models? I like to use decision tree models like the random forest which can provide the relative importance of each predictor in determining the target variable. It will help us understand how predictor variables affect the response variable.

L428 how do you define “a substantial variability”?

 

 

Author Response

Dear reviewer, thank you for your time and effort in review of our manuscript. Please find our responses to your comments/concerns below.

This study used multispectral information to drive different machine learning models to predict the occurrence of leaf rolling for maize. The authors showed the promise of using multispectral sensing in detecting drought stress in plants. The paper reads well and the results were shown in a simple and straightforward way. Some sections are not clear and need further details, especially material and methods, and discussion. See below for my suggestions.

L13 multispectral arrays?

Only a single array of 3x2 size was used.

L29-30 It reads like human population growth results in climate change, references are needed in this sentence.

Changing climate was removed, this was a spurious connection.

L41-67 This paragraph is too long and the major points that you want to show are very hard to follow. Maybe separate it into two parts, especially for the introduction of leaf rolling in plants.

L79-94 it’s very hard to follow these two paragraphs. Can you specify some examples of the use of ML for detecting drought stress? Can you please show the pros and cons of using ML in predicting stress types? Did you do dimension reduction in this study?

The paragraphs have been rearranged for better readability. Also, the information not relevant to the manuscript text was removed.

Table 1 what do you mean by “FAO maturity”? What unit is used for “FAO maturity”? The unit for “Anthesis interval” is missing.

Both information were added, along with reference to Dwyer et al 1999 for FAO maturity classifications.

Figure 1 Is that daily mean temperature or daily diurnal temperature? The unit of rainfall is mm per day?

Temperatures shown are diurnal, and the rainfall is in mm/m2 per day. Per day was added to the new version.

L174 …are available as shown in Figure S1.

The reference to the Supplementary figure S1 was added.

L234 both?

Changed to both

Figure 4 How many data points were used in this figure? 15? Suggest using scatter with straight lines and marker plots to show these different VIs. What is A.U.?

We added the reference to Table 2 to caption, showing standard deviations of the VIs. Figure 4 was created using the 1631 LR- data points and 549 LR+ data points over 15 VIs. We created a scatter plot with 8235 datapoints for LR+ and 24465 datapoints for LR-, but it looks very busy, so we decided to keep the figure in the present form. We also tried to plot the standard errors, but the values are very low due to the large sample sizes, making the error bars barely visible. Standard deviations for the VIs over LR are shown in Table 2, and the deviation metrics are not applicable for the difference as it assesses the differences between mean values.

Table 3 I strongly suggest repeating the random subset data process 5 or 10 times to show the average of each statistical index (sd can be shown in bracket). How did you randomly select 15% data for validation? Did you set a random seed?

The aim was to robustly assess the model generalization abilities. Thus, three validation procedures were carried out. Since our dataset with >2000 records allowed us to take out a random sample of 15% of the records and still retain a sufficient sample size (1853 samples) for cross-validation, we decided to test the models in three different testing pipelines. Also, the full Python notebooks are available upon request to ensure reproducibility of the results.

Firstly, the random subset of 15% of the dataset was taken (327 measurements) with random seed set as 109. The information on the random seed number was added to the material and methods section. This set was termed Random subset

  • Related to this was to test it in cross validation, where 1/5th of the dataset (in line with your suggestion) was taken out in each fold, and the procedure was repeated for five times with scikit-learn pipeline for stratified K fold cross validation, meaning that in each fold, balance of the classes was retained. For this validation we report standard deviations as suggested (in brackets).
  • Random subset was latter used for external model validation after the model has been fit using a scikit-learn pipeline for stratified K fold cross validation. This type of validation is a form of external validation in which the dataset that is used to validate the model is different from the dataset used to train the model (K-fold cross validation), thus it represents the model generalization ability and does not require standard deviation, as the model is validated once in this procedure.
  • To further assess the model generalization ability, another validation process was carried out with an external dataset that was unrelated with the training and Random subset datasets.

This type of validation pipeline allows to assess the model potential (cross validation), type I error (random subset) and the generalization ability (external validation)

L334 delete able

Deleted able

L338-346 these should belong to the Introduction section

The mentioned lines explain the data acquisition and our sensor that was not introduced until material and methods section, so moving these lines to the introduction might render the introduction section hard to follow.

L372 VIs

Changed to VIs

L394 SLP

Changed

L409 in studies or in a study?

There are two referenced studies by the same principal author[77, 78].

L408-411 not clear, please rewrite this sentence

The paragraph was rewritten

Discussion: this section is too general and the depth of discussion is not enough. What is the limitation of your study? What are the implications of your results for model development? Can some satellite data be used as inputs of your model as proxy variables of VIs used here? Have you considered increasing the interpretation of these ML models? I like to use decision tree models like the random forest which can provide the relative importance of each predictor in determining the target variable. It will help us understand how predictor variables affect the response variable.

We added another paragraph discussing the main limitations of our study and perspectives for improvement. The use of satellite reads was discussed in introduction section as one of possible sensing strategies that was not employed in our study. Our study aimed to demonstrate the usability of proximal remote sensing low-cost multispectral array (3x2 photo diode array) responsive to wavelengths in red and near infrared spectra (610, 680, 730, 760, 810 and 860nm) with 20nm full width at half maximum. Although possibly compatible with some satellite reads, finding and testing the compatible data would require lot of effort.

The decision trees represent a great ML solution for many tasks, and indeed, they show good performance and interpretability, but comparable to NNs (nice review by prof. Peter Roßbach, https://blog.frankfurt-school.de/wp-content/uploads/2018/10/Neural-Networks-vs-Random-Forests.pdf). However, there are tens of great ML pipelines available, and at our discretion we chose four that were most likely to be employed in the devices (Raspberry PIs) that will be mounted in the fields in further steps of sensing node development.

L428 how do you define “a substantial variability”?

We removed “substantial” from the sentence

Reviewer 2 Report

Remarks on manuscript: Machine learning in analysis of multispectral reads in maize canopies responding to increased temperatures and water deficit submitted to the Remote Sensing.  Whole manuscript is well written, results are clearly presented. Novelty of work is highlighted.  I have only few remarks:

  1. Why authors didn't use LDA (linear discriminant analysis) instead of PCA?
  2. The authors worked with averages but did not say anything about the nature of the data distribution. Normality tests should be provided for pairwise comparisons.
  3. Because VI are not continuous, it is advisable to present Figure 4 in the form of a histogram.
  4. Authors may consider: DOI: 10.1016/j.rsase.2021.100679
  5. Authors should avoid to use term, our results, we did, etc. and begin of sentences with symbols or abbreviations. References should be carefully checked and format as per journal rule.

Author Response

Reviewer 1:

Dear reviewer, thank you for your time and effort reviewing our manuscript. We hope you will find our responses and changes to manuscript text necessary and of sufficient quality.

Remarks on manuscript: Machine learning in analysis of multispectral reads in maize canopies responding to increased temperatures and water deficit submitted to the Remote Sensing.  Whole manuscript is well written, results are clearly presented. Novelty of work is highlighted.  I have only few remarks:

  1. Why authors didn't use LDA (linear discriminant analysis) instead of PCA?

Thank you for this constructive remark. We considered both PCA and LDA in a preliminary exploratory analysis, however, the PCA was chosen due to some favorable properties. Namely, it offers an unsupervised summarization of the data without considering the data labels, thus providing the valuable information to the reader about the data dispersion (in terms of total variance), and the variable contributions in terms of 2d latent structure hyperplane. Contrarily, LDA considers the labels, and provides basically the same information covered by the assessed ML algorithms, leaving the reader uninformed about how the data “breathe”. Thus, to prevent redundancy, we chose to display the results of PCA.

  1. The authors worked with averages but did not say anything about the nature of the data distribution. Normality tests should be provided for pairwise comparisons.

We agree that the distributions were not exhaustively assessed as part of this manuscript. According to the results of Arnastauskaite et al (2021, https://doi.org/10.3390/math9070788), with sample sizes >1000, normality tests represent only the power of statistical test, meaning that the slightest deviations from dthe assumed distributions will lead to the rejection of the null hypothesis. We thus only visually assessed the results for normality. The distributions of the data were not included in the manuscript, as all assessed ML algorithms work agnostically of the input data distribution, relying on high-dimensional feature space. The line covering the visual assessment was added to the Material and methods section.

 

  1. Because VI are not continuous, it is advisable to present Figure 4 in the form of a histogram.

There are 15 features over two groups plus the difference. Are you suggesting to draw 45 histograms, or 15 histograms with three groups each? We believe that since the aim of Figure 4 is not the assessment of the feature distribution, but rather increasing difference between reads in groups LR- and LR+, approaching the near IR spectra, the Figure 4 is suitably informative.

  1. Authors may consider: DOI: 10.1016/j.rsase.2021.100679

The suggested reference was added to the main text as reference no. 34.

  1. Authors should avoid to use term, our results, we did, etc. and begin of sentences with symbols or abbreviations. References should be carefully checked and format as per journal rule.

We checked the manuscript for sentences starting with abbreviations or symbols and wrote out full terms.

Reviewer 3 Report

The authors apply machine learning operations to image analysis of agricultural production conditions, and compare and verify multiple models in this manuscript. However, most of the examples of image analysis applied to agricultural production use large-scale production bases. What are the limits for the fragmented and small-scale planting bases? It would be better if the authors could explore in more depth the possible effects of climatic and topographical factors. Overall, I find this manuscript worthy of publication.

Author Response

Dear reviewer, thank you for your time and effort reviewing our manuscript. We strongly agree that the generalization ability of these models should increase given the multiple years/topographic scenarios. As a first step in analysis of the usefulness of our sensing node, we carried out the experiments in multiple trials, however, bearing similar topographic attributes. Further research should expand the testing framework in bot space and time.

The authors apply machine learning operations to image analysis of agricultural production conditions, and compare and verify multiple models in this manuscript. However, most of the examples of image analysis applied to agricultural production use large-scale production bases. What are the limits for the fragmented and small-scale planting bases? It would be better if the authors could explore in more depth the possible effects of climatic and topographical factors. Overall, I find this manuscript worthy of publication.

Reviewer 4 Report

The research is exciting and significant. The knowledge that goes toward using ML algorithms in combination with affordable remote sensing devices to accurately monitor plant parameters and drought phenomena is fundamental in the current conditions of climate change and the need to conserve natural resources. Such solutions are at the beginning of development and it is important to continue with this research.

My suggestions:
1. Although there are, somehow the data related to weather conditions during the research are insufficiently highlighted. I think it is important to see the connection between the production conditions and the parameter of leaf rolling. It would be interesting to have evapotranspiration data also. 

2. On the part of lines 115 - 116, the application of irrigation is stated. I think it is necessary to state the method of irrigation, as well as the amount of water given. Since these parameters are directly related to drought.

Author Response

Dear reviewer, thank you for your time and effort reviewing our manuscript. We hope you will find our responses to your comments/concerns of sufficient quality.

The research is exciting and significant. The knowledge that goes toward using ML algorithms in combination with affordable remote sensing devices to accurately monitor plant parameters and drought phenomena is fundamental in the current conditions of climate change and the need to conserve natural resources. Such solutions are at the beginning of development and it is important to continue with this research.

My suggestions:
1. Although there are, somehow the data related to weather conditions during the research are insufficiently highlighted. I think it is important to see the connection between the production conditions and the parameter of leaf rolling. It would be interesting to have evapotranspiration data also. 

The sensor on our reference weather station for solar influx got broken just prior to the sowing season. Since the sensor was not repaired since the critical moments of vegetation, we did not have all data needed to calculate ETo according to Penman-Monteith model. We thus presented the results on VPD (Figure 1a) as a rule-of-the thumb approximation of evapotranspiration during the sampling.

  1. On the part of lines 115 - 116, the application of irrigation is stated. I think it is necessary to state the method of irrigation, as well as the amount of water given. Since these parameters are directly related to drought.

The information on the amounts of water per irrigation and the method of irrigation were added to the Material and Methods section

Back to TopTop