Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessFeature PaperArticle

Peer-Review Record

Using Unmanned Aerial Vehicles and Multispectral Sensors to Model Forage Yield for Grasses of Semiarid Landscapes

Grasses 2024, 3(2), 84-109; https://doi.org/10.3390/grasses3020007

by Alexander Hernandez^1,*

, Kevin Jensen¹, Steve Larson¹

, Royce Larsen², Craig Rigby¹, Brittany Johnson¹

, Claire Spickermann¹ and Stephen Sinton³

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Grasses 2024, 3(2), 84-109; https://doi.org/10.3390/grasses3020007

Submission received: 22 September 2023 / Revised: 26 April 2024 / Accepted: 29 April 2024 / Published: 17 May 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Using unmanned aerial vehicles UAV and multispectral sensors to model forage yield for grasses of semiarid landscapes

The manuscript provides several models for estimating grass yield for different species. This is useful in understanding the effectiveness of restoration, carbon cycling, and forage production for cattle.

Overall, this is a well written manuscript that provides clear justification for the study. I have a major issue with the model comparisons based simply on R². I provide my argument in the Results comments below that relying on R² does not provide a valid comparison of models to determine “best” fit. The authors are relying on a false interpretation that R² provides any information beyond variance of the response around the regression model line.

Introduction

L 55, 57 – Odd when abbreviations are not in parentheses, for example UAV and RGB

L 55-61 – Awkward presentation of the research as “this study” with no citation until the next sentence. First assumption was that “this study” was the current manuscript.

L 63-64 – Similar to above, “the study” is awkward. Why not just make the statement “However, there were limitations in observing...”

Materials and Methods

L 109-110 – This is bad for citing tables and figures. Should just be “...species were established (Table 1).”

L 144, 147, 170, 199 – Odd abbreviations again not in parentheses – USU, IWG, BBWG, RMS

L 157 – Not really a guarantee, more that “...we attempted to capture...”

Figure 4 – BBW isn’t a defined abbreviation, need to define in the caption.

L 253 – Should not “See table 2...”, an informative statement and then a citation (Table 2). The tables and figures are not the methods, the text is and the tables and figures are there just for supportive information.

L 267 – Similar to above.

L 274 – I recommend putting R package names in italics and function names in quotes. Makes it easier for the reader to identify packages and functions.

Results

Overall, using R² to make fitness determination of models or selecting “best” fit models is not correct. The idea that R² provides any information beyond simple variance from the line is a fallacy. R² does not 1) indicate bias of coefficients (need to use residual plots to define that), 2) indicate adequate fit (easily to have a “low” R² model that is “good” and a “high” R² model that is “bad”), or 3) provide any comparison to other models (it is specific to that data and that equation). Interpretation of residual plots, AIC or BIC, and AUC provide extensive/key/vastly more information about the model fitness and appropriateness that is absolutely missing from a simple R² calculation. This is even more an issue when comparing completely different modeling techniques (LM vs RF). R² provides no valid comparison between these models.

L 408 – Consistency, capitalize R in R²

Figure 10 – If these are correlations, how can there be a line with a confidence interval or R²? If they are correlations, you have a r and p-value, no line and no confidence interval. The line only exists in regression. P-values are all < 0.001, that is a sufficient notation. Any further detail (e.g., 1.4e-08) conveys no more information that < 0.001 (three decimal places for a p-value are more than enough).

Table 4 – Is this level of precision (3 decimal places) the level of measurement in the balance? If so, fine. If not, this needs to be revised to the correct level of precision.

Figure 12 – Not clear if these are Spearman correlations (needs to be defined why using non-parametric over Pearson’s correlation [issue with assumptions of normality? Homogeneity of variance?]) or linear regression. Very different analyses with very different null hypotheses tested. Same as above, too. If these are correlations, then line and confidence interval do not exist and it is r, not R².

L 447 – Here it is RF for random forest, but in methods referred to as “rf”. Be consistent.

Table 6 – Not cited in text.

Discussion

Figure 17 – The discussion is not the location for presenting results. These results should be in the proper section in order to interpret/discuss them here.

Author Response

Comments and responses – Reviewer 1

Thank you very much for taking the time to thoroughly review our manuscript and for providing excellent comments. We have addressed each comment separately and our responses can be found below.

Introduction

L 55, 57 – Odd when abbreviations are not in parentheses, for example UAV and RGB

Response: Thanks for pointing this out. The aforementioned abbreviations have been enclosed in parentheses.

L 55-61 – Awkward presentation of the research as “this study “with no citation until the next sentence. First assumption was that “this study” was the current manuscript.

Response: The ambiguous text has been removed.

L 63-64 – Similar to above, “the study” is awkward. Why not just make the statement “However, there were limitations in observing...”

Response: The text has been changed according to the reviewer’s suggestion.

Materials and Methods

L 109-110 – This is bad for citing tables and figures. Should just be“...species were established (Table 1).”

Response: The text has been removed according to the reviewer’s suggestion. Thank you.

L 144, 147, 170, 199 – Odd abbreviations again not in parentheses– USU, IWG, BBWG, RMS

Response: Thanks for pointing this out. The aforementioned abbreviations have been enclosed in parentheses.

L 157 – Not really a guarantee, more that “...we attempted to capture...”

Response: Text has been changed to “attempted” as suggested by reviewer

Figure 4 – BBW isn’t a defined abbreviation, need to define in the caption.

Response: The caption for figure 4 has been changed according to the suggestion

Response: The text has been modified according to the reviewer’s suggestion. Thank you.

L 267 – Similar to above.

Response: The text has been modified according to the reviewer’s suggestion. Thank you.

L 274 – I recommend putting R package names in italics and function names in quotes. Makes it easier for the reader to identify packages and functions.

Response: The text has been modified according to the reviewer’s suggestion. Thank you.

Results

Overall, using R² to make fitness determination of models or selecting “best” fit models is not correct. The idea that R² provides any information beyond simple variance from the line is a fallacy.R² does not 1) indicate bias of coefficients (need to use residual plots to define that), 2) indicate adequate fit (easily to have a “low”R² model that is “good” and a “high” R² model that is “bad”), or 3)provide any comparison to other models (it is specific to that data and that equation). Interpretation of residual plots, AIC or BIC, and AUC provide extensive/key/vastly more information about the model fitness and appropriateness that is absolutely missing from a simple R² calculation. This is even more an issue when comparing completely different modeling techniques (LM vs RF). R² provides no valid comparison between these models.

Response: Thank you very much for pointing this statistical consideration. We completely agree that the Akaike Information Criterion (AIC) is commonly used for model selection in statistical modeling. However, applying AIC to a Random Forest model such as the ones presented in our research is challenging and more than likely impractical or impossible due to the ensemble nature of Random Forests. Random Forests are ensemble learning methods composed of multiple decision trees. Each tree is constructed using a random subset of features and data points, and the final prediction is the average of all tree predictions. This ensemble approach complicates the direct application of AIC, which is based on the likelihood function and penalizes model complexity. Random Forests are renowned for their high accuracy and computational efficiency, but, despite their success, the ensemble nature of Random Forests presents challenges for traditional model selection criteria like AIC. Similarly, the Bayesian Information Criterion (BIC) is designed for models with a fixed number of parameters, which is not the case for Random Forests models. In Random Forests the number of parameters grows with the number of trees in the forest, making it impractical or impossible to determine a fixed number of parameters for the BIC calculation. In conclusion, the ensemble structure of Random Forests, where predictions are obtained from multiple trees, hinders the direct computation of AIC or BIC on a Random Forest model.

In addition to using R² we have included the Root Mean Squared Error (RMSE) in our scatterplots. As RMSE measures the average of the squares of the errors (differences between the observed and predicted values), we considered that including this extra performance metric will help the reader understand our choosing of the best models based on R² and RMSE.

L 408 – Consistency, capitalize R in R²

Response: Thank you for this observation. We have changed and capitalized R throughout the document.

Figure 10 – If these are correlations, how can there be a line with a confidence interval or R²? If they are correlations, you have a r and p-value, no line and no confidence interval. The line only exists in regression. P-values are all < 0.001, that is a sufficient notation. Any further detail (e.g., 1.4e-08) conveys no more information that< 0.001 (three decimal places for a p-value are more than enough).

Response: Thank you for this observation. We have modified Figure 10 based on the reviewer. We have eliminated the regression line and confidence intervals as well as changing the notation for the p-value significance. We have included a new version of Figure 10 in the manuscript.

Table 4 – Is this level of precision (3 decimal places) the level of measurement in the balance? If so, fine. If not, this needs to be revised to the correct level of precision.

Response: Thank you for this observation. The values shown in this table are the level of measurement of the scales used at the field and after the drying process.

Response: Thank you for this observation. We have modified Figure 12 based on the reviewer. We have eliminated the regression line and confidence intervals as well as changing the notation for the p-value significance. We have included a new version of Figure 12 in the manuscript.

L 447 – Here it is RF for random forest, but in methods referred toas “rf”. Be consistent.

Response: Thank you for this observation. We have changed this notation throughout the document to show consistency about the Random Forest models.

Table 6 – Not cited in text.

Response: Thank you for this observation. We corrected the text to cite Table 6 in the text.

Discussion

Figure 17 – The discussion is not the location for presenting results. These results should be in the proper section in order to interpret/discuss them here.

Response: Thank you for this observation. We have modified the text in the discussion and have moved the figures to the supplementary material.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

General Comments:

Interesting manuscript regarding the potential of remote sensed data to predict forage yield of multiple grass species in semi-arid parts of the United States. While the basic idea of the manuscript seems to be well developed a lot of the practical aspects and the way the manuscript is presented are concerning.

Firstly, while the use of wet weight as forage yield is practically acceptable it is generally not advisable due to the potentially large variations and uncertainties that comes with water variation. Furthermore, and as outlined by the authors in the discussion, dried weight is what actually matters in terms of biomass and implications for growth. The authors need to motivate their use of wet weight outside of practicality, as they could have easily taken representative sub-samples of each harvested area and produced reliable dry weight estimates for use in the models.

Secondly, the use of ‘dense’ and ‘sparse’ grass canopies is not defined, it is neither clearly detailed to which species the authors describe this feature. This is supposedly a major part in the interpretation of the 3D modelling, and thus it needs to be properly described and motivated.

Thirdly, while the use of multiple research fields, species and harvests increase the overall scope of the study, it makes the comparison between species and models unbalanced, contributing to a lot of uncertainty. The implication of this uncertainty is not explored within the manuscript, leading the reader to believe that all models and species are equally represented. Further emphasis and implications of this needs to be detailed in the discussion.

Fourthly, there is no specifications of the models included in the manuscript. Were all the vegetation indices in table 2 included in the models? And why? Certainly, it seems like some of the models would be overfitted due to low numbers of observations by using all the indices plus the other predictors as well. Was potential overfitting accounted for? Why were not the most useful indices kept in the model and the rest removed?

Fifthly, what is seemingly missing in the discussion is the comparison of the different models in the manuscript and the pros and cons of each type of variable in predicting the wet weight of the harvests. With all of these vegetation indices, imaging techniques and remote sensing some of them should be more accurate than others in the research focus, further emphasis needs to be put on this distinction.

Lastly, and perhaps paradoxically based on the previous statements, the manuscript seems unnecessarily long and superfluous, this is especially clear in the introduction and the discussion, as much of it is literature review without synthesis and repetition of results. The authors should try to condense these sections, synthesize and keep the essential pieces of information to focus the reader in the right direction. While the methods are quite detailed as well, it is likely more required due to the specifics of the remote sensed analyses. If these can be condensed to some extent as well that would help as well.

Specific Comments:

1. Lines 38-40. This sentence does not convey any point and is partly the same as the below sentence (lines 41-46) as well. Remove or reformat.

2. Lines 41-46. This sentence is too long, try splitting it.

3. Lines 75-91. This is primarily methods with parts introduction. If relevant move methods sections to the methods and reformat the species introduction bits if it is important to keep.

4. Lines 97-104. This part does not belong with the introduction, as it seems almost conclusive in nature. If it is important move it to the conclusion.

5. Lines 132, 163-164 and 185-186. How was the air drying performed? The usual method for determining dried biomass weight is drying in 65-70°C until constant weight. Was this conducted here as well? If not, why and how was it conducted?

6. Lines 138-141, 154-156 and 174-176. The multi-stage harvesting procedures is not thoroughly elaborated upon, especially in relation to the incorporation into the analyses. It is stated that this multi-stage harvesting can provide insight into different growth stages, yet this aspect is completely missing in the results and discussion. This needs to be clarified.

7. Lines 290-297. This section can easily be reduced to a sentence.

8. Lines 292-294. This is the first time it is mentioned that wet weight is considered the forage yield, this should probably be mentioned and explored earlier than the statistical analysis section.

9. Lines 520. Table 6?

10. Lines 562-592. This section does not convey much more information than is already present in the introduction, since it states that UAVs can be used for monitoring, it summarizes some previous studies and then states that not a lot of studies are available for comparison with the manuscript findings. If the section does not convey any insightful reflections of the manuscript results then it is not necessary.

11. Lines 593-653. This entire section contains 4 unique references, none seemingly discussed in the light of the manuscript findings, but as general literature review. Are there no other studies that have investigated the possibility of modelling plants using these types of remote sensed techniques? Is there no other comparative literature? Also, the inclusion of new figures in the discussion is generally discouraged.

Figures and Tables:

1. Table 4. What are the units for these measurements?

2. Figure 9. Is this figure mentioned in the text?

3. Figure 10. It would be useful here to insert the number of observations per species, as it is a crucial part in interpreting the uncertainty in the relationships due to uneven distributions between species.

4. Tables 5 and 6. The ‘bold’ is not clearly visible in the manuscript, perhaps another measure could be used to represent the superior model if this is an important feature.

Comments on the Quality of English Language

The language is strange at times, with the sentence structure and word-choice often unscientific and unspecific.

Author Response

Comments and responses – Reviewer 2

Thank you very much for taking the time to thoroughly review our manuscript and for providing excellent comments. We have addressed each comment separately and our responses can be found below.

Firstly, while the use of wet weight as forage yield is practically acceptable it is generally not advisable due to the potentially large variations and uncertainties that comes with water variation. Furthermore, and as outlined by the authors in the discussion, dried weight is what matters in terms of biomass and implications for growth. The authors need to motivate their use of wet weight outside of practicality, as they could have easily taken representative sub-samples of each harvested area and produced reliable dry weight estimates for use in the models.

Response: Thank you for indicating this concern on our discussion of results. We have included the following text in the discussion (section 4.1):

“In our modeling, there is a fundamental reason to utilize the wet weights. The spectral information that the digital sensors (i.e. RGB, multispectral) captured at the fields is a direct reflection of the grasses’ green matter which include the water content at the time of each flight. While there are exceptionally high correlations between wet weights and dry weights for our samples (Fig.10), we did not want to conduct an indirect regression using the dry weights as the response variable since the spectral data and derivatives (i.e. NDVI, NDRE) were collected over the green matter plots.”

This will clarify to the reader why we used the wet weights as opposed to dry weights. Basically, the spectral data collected at the field can only be regressed directly with the wet weights. In our research, we did not want to conduct an indirect regression of dry weights with the spectral information captured by the UAV sensors.

Secondly, the use of ‘dense’ and ‘sparse’ grass canopies is not defined, it is neither clearly detailed to which species the authors describe this feature. This is supposedly a major part in the interpretation of the 3D modelling, and thus it needs to be properly described and motivated.

Response: Thank you for indicating this concern on our description of sparse vs dense grasses. We have included the following text in the methods (section 2.2 Forage data collection):

“In this research we emphasize the building of models for grasses that exhibit dense and sparse canopy architectures. The sparse canopy grass species used in this research is Bluebunch wheatgrass (BBWG) (Pseudoroegneria spicata) and the rest of the species are considered grasses with dense canopies. The main criterion used to differentiate sparse from dense canopies is the number of lignified stems in the canopy. Bunchgrasses such as BBWG, unlike sod-forming grasses (the rest of the species used here); have a crown area composed of many individual stems packed into the canopy. This structural difference raises the likelihood of bunchgrasses to have more stems in the canopy compared to sod-forming grasses [18]. The growth form of bunchgrasses can also lead to self-shading of their foliage, reducing the overall amount of photosynthetically active leaf area [19]. Conversely, the foliage of sod-forming grasses tends to be more abundant and continuous due to their spreading (rhizomatous) nature, contributing to a greater overall coverage of foliage [20] compared to bunchgrasses. In this context, the canopies of sod-forming grasses are comparatively denser (i.e. more foliage or green matter) than bunchgrasses which for our research purposes is considered to have a sparse canopy.”

This description includes three new references:

Wilcox, K.R.; Chen, A.; Avolio, M.L.; Butler, E.E.; Collins, S.; Fisher, R.; Keenan, T.; Kiang, N.Y.; Knapp, A.K.; Koerner, S.E.; et al. Accounting for Herbaceous Communities in Process‐based Models Will Advance Our Understanding of “Grassy” Ecosystems. Global Change Biology 2023, 29, doi:10.1111/gcb.16950.
Caldwell, M.M.; Dean, T.J.; Nowak, R.S.; Dzurec, R.S.; Richards, J.H. Bunchgrass Architecture, Light Interception, and Water-Use Efficiency: Assessment by Fiber Optic Point Quadrats and Gas Exchange. Oecologia 1983, 59, doi:10.1007/bf00378835.
Velásquez-Valle, M.A.; Sánchez-Cohen, I.; Gutiérrez-Luna, R.; Muñoz-Villalobos, J.A.; Macías-Rodríguez, H. HYDROLOGICAL IMPACT OF LAND-USE CHANGE FROM RANGELAND TO BUFFELGRASS (Pennisetum Ciliare L.) PASTURE. Revista Chapingo Serie Zonas Áridas 2014, XIII, doi:10.5154/r.rchsza.2013.10.004.

Thirdly, while the use of multiple research fields, species and harvests increase the overall scope of the study, it makes the comparison between species and models unbalanced, contributing to a lot of uncertainty. The implication of this uncertainty is not explored within the manuscript, leading the reader to believe that all models and species are equally represented. Further emphasis and implications of this needs to be detailed in the discussion.

Response: Thank you for indicating this concern about our inferences and misleading the reader that our models are balanced. We have modified and included text in the discussion (section 4.5 Limitations of the global models and future work):

“The models presented in this study only apply to the grasses that were harvested in our three site locations and due to the utilization of multiple research fields, species, and harvests, we acknowledge that the comparison between species and models is unbalanced. However, we do not consider that this situation introduces additional uncertainty in our results. As we indicated earlier (section 2.4.2) our strategy of using a stratified cross validation (SCV) homologizes the chances of sites, species, and harvests to fully participate in the global model. Using SCV in statistical problems dealing with unbalanced datasets has shown promise [51,52] to prevent model bias toward the class or stratum that has more observations (i.e. grasses such as Festuca arundinaceae and Pseudoroegneria spicata in this research) as SCV guarantees equal representation in both training and validation sets [53].”

This new text prompted the inclusion of three new references:

Risk, C.; James, P.M.A. Optimal Cross‐Validation Strategies for Selection of Spatial Interpolation Models for the Canadian Forest Fire Weather Index System. Earth and Space Science 2022, 9, doi:10.1029/2021ea002019.
Sarinelli, J.M.; Murphy, J.P.; Tyagi, P.; Holland, J.B.; Johnson, J.W.; Mohamed, M.; Mason, R.E.; Babar, A.; Harrison, S.A.; Sutton, R.; et al. Training Population Selection and Use of Fixed Effects to Optimize Genomic Predictions in a Historical USA Winter Wheat Panel. Theoretical and Applied Genetics 2019, 132, doi:10.1007/s00122-019-03276-6.
López, V.; Fernández, A.; Herrera, F. On the Importance of the Validation Technique for Classification with Imbalanced Datasets: Addressing Covariate Shift When Data Is Skewed. Information Sciences 2014, 257, 1–13, doi:10.1016/j.ins.2013.09.038.

Fourthly, there is no specifications of the models included in the manuscript. Were all the vegetation indices in table 2 included in the models? And why? Certainly, it seems like some of the models would be overfitted due to low numbers of observations by using all the indices plus the other predictors as well. Was potential overfitting accounted for? Why were not the most useful indices kept in the model and the rest removed?

Response: Thank you for indicating this concern about our description of the models and predictors that were used in each model variant. We have included new text in the methods and results to clarify which predictor were used and why.

Text added in the methods (section 2.4.3 Fitting and validating models for RGB and multispectral imagery) to describe the process for selecting which predictors to use in each model variant:

“Except for (A) above, the process to select the predictors to be used in each one of the model variants was the following:

Fit temporary random forest models with all the available predictors for a particular model variant as explained above. For instance, for variant (B) above a temporary random forest model with the volumetric 3D space, the three RGB bands, and all the RGB indices (i.e. BI, SCI, GLI, NGRDI, VARI, BGI) was fitted.
For each of these temporary random forest models, we extracted variable importance information [31,32] to identify the most relevant features or predictor covariates for prediction. At the same time, the variable importance rankings allowed us to filter out low importance or irrelevant variables to enhance model performance.
From the variable importance plots we use the mean decrease in predictive accuracy to select the predictors that would participate in each model variant. While there is no consensus [33] in the literature about what threshold to use to select the major predictors, we arbitrarily chose to keep the predictors with the highest scores (> 20% in importance).”

This text prompted the inclusion of three new references:

Jiang, F.; Kutia, M.; Sarkissian, A.J.; Lin, H.; Long, J.; Sun, H.; Wang, G. Estimating the Growing Stem Volume of Coniferous Plantations Based on Random Forest Using an Optimized Variable Selection Method. Sensors 2020, 20, doi:10.3390/s20247248.
Liu, Y.; Zhao, H. Variable Importance‐weighted Random Forests. Quantitative Biology 2017, 5, doi:10.1007/s40484-017-0121-6.
Cho, H.; Lee, E.H.; Lee, K.-S.; Heo, J.S. Machine Learning-Based Risk Factor Analysis of Necrotizing Enterocolitis in Very Low Birth Weight Infants. Scientific Reports 2022, 12, doi:10.1038/s41598-022-25746-6.

Text added in the results (section 3.3.1 Chosen predictor variables) to describe which predictors were used after applying the 20% importance threshold described in the methods:

“3.3.1. Chosen predictor variables

With the application of the rule that we set up (section 2.4.3) of only selecting predictors with an importance of 20% or higher, the following covariates were selected:

We selected five (5) variables for model variants B (LM-RGB) and D (RF-RGB). These were (in order of importance): The volumetric 3D, BI, GLI, SCI, and BGI.
For model variants C (LM-Multi) and E (RF-Multi) we selected eight (8) predictor variables. These were (in order of importance): The volumetric 3D, GNDVI, RVI, NDVI, NDRE, GLI, BI, and BGI.

We provide variable importance plots for the model variants in the supplementary material (Fig. S3).”

We have included a brand new figure in the supplementary material (random forest variable importance plots) to graphically represent the selection of the predictors in each model variant.

Fifthly, what is seemingly missing in the discussion is the comparison of the different models in the manuscript and the pros and cons of each type of variable in predicting the wet weight of the harvests. With all of these vegetation indices, imaging techniques and remote sensing some of them should be more accurate than others in the research focus, further emphasis needs to be put on this distinction.

Response: Thank you for indicating this concern about this deficiency in comparing the models and adding more context about why some predictors were used in the final models. We have renamed a section in the discussion (section 4.4 new title is Differences across model structures – how multispectral datasets improve model fit). The text in this section has been reworked to address concerns:

“The RF-Multi model variant included covariates such as NDVI, RVI, BGI and GLI (section 3.3.1) which have been heavily used to model biomass and yield (Table 2). Our results showed that the inclusion of these near-infrared vegetation indices (i.e. NDVI, RVI, and NDRE) was far more beneficial in improving the accuracy of forage yield for the sparse than for dense canopy grasses.”

The rest of the text in this section has been reworded as well to address this comment.

Lastly, and perhaps paradoxically based on the previous statements, the manuscript seems unnecessarily long and superfluous, this is especially clear in the introduction and the discussion, as much of it is literature review without synthesis and repetition of results. The authors should try to condense these sections, synthesize and keep the essential pieces of information to focus the reader in the right direction. While the methods are quite detailed as well, it is likely more required due to the specifics of the remote sensed analyses. If these can be condensed to some extent as well that would help as well.

Response: Thank you for indicating this concern about unnecessary long introduction and discussion. We have worked these two sections for brevity where we thought it was possible without omitting essential details that we think the final reader will benefit from. All the rewording and deletions are illustrated in the manuscript using tracked changes.

Specific Comments:

Lines 38-40. This sentence does not convey any point and is partly the same as the below sentence (lines 41-46) as well. Remove or reformat.

Response: Thank you for this indication. This sentence has been removed.

Lines 41-46. This sentence is too long, try splitting it.

Response: Thank you for this indication. This sentence has been split.

Lines 75-91. This is primarily methods with parts introduction. If relevant move methods sections to the methods and reformat the species introduction bits if it is important to keep.

Response: Thank you for this indication. We have reworded this text eliminating the text that resembled methods, but keeping the information about which grasses were used. We consider that this text must be kept since it provides the reader context about the ecological significance of the grasses that we built models for. We feel strongly about the reader knowing (from the point of the introduction) why we selected these grasses and how big the grasses’ impact is on restoration of degraded rangelands across the western USA.

Lines 97-104. This part does not belong with the introduction, as it seems almost conclusive in nature. If it is important move it to the conclusion.

Response: Thank you for this indication. We have moved the text to the conclusion as indicated.

Lines 132, 163-164 and 185-186. How was the air drying performed? The usual method for determining dried biomass weight is drying in 65-70°C until constant weight. Was this conducted here as well? If not, why and how was it conducted?

Response: Thank you for this indication. We have included text in the methods (section 2.2.1) that describes the process of air drying:

“The air drying was conducted by leaving the bagged samples at the drier for several days at 60°C until constant weights were achieved.”

Lines 138-141, 154-156 and 174-176. The multi-stage harvesting procedures is not thoroughly elaborated upon, especially in relation to the incorporation into the analyses. It is stated that this multi-stage harvesting can provide insight into different growth stages, yet this aspect is completely missing in the results and discussion. This needs to be clarified.

Response: Thank you for pointing out this source of confusion. We have included the following text on the discussion (section 4.1 On the use of wet weights instead of dry weights) to clarify that with our models we did not intend to provide additional insights about the different growth stages.

“While we conducted different harvests, all the values were standardized as described in the methods; but our modeling results were not intended to provide additional insights about growth stages.”

Lines 290-297. This section can easily be reduced to a sentence.

Response: Thank you for this indication. We feel strongly about keeping the original text as it provides the reader with context about the different plot sizes, and how these different sizes can influence the calculation of the response variable.

Lines 292-294. This is the first time it is mentioned that wet weights considered the forage yield, this should probably be mentioned and explored earlier than the statistical analysis section.

Response: Thank you for indicating this source of confusion. We have reworded the text to indicate from the very beginning what our response variable was in all of our model variants.

Lines 520. Table 6?

Response: Thank you for this indication. We have changed the text to reference Table 6.

Lines 562-592. This section does not convey much more information than is already present in the introduction, since it states that UAVs can be used for monitoring, it summarizes some previous studies and then states that not a lot of studies are available for comparison with the manuscript findings. If the section does not convey any insightful reflections of the manuscript results, then it is not necessary.

Response: Thank you for this indication. We have deleted some of the text of this section and reworded some of it. We feel strongly about keeping the references that we included to provide context to the reader about the novelty of our research, and how this type of modeling has not been conducted for forage yield with the grasses that we selected.

Lines 593-653. This entire section contains 4 unique references, none seemingly discussed in the light of the manuscript findings, but as general literature review. Are there no other studies that have investigated the possibility of modelling plants using these types of remote sensed techniques? Is there no other comparative literature? Also, the inclusion of new figures in the discussion is generally discouraged.

Response: Thank you for this indication. We conducted an exhaustive literature review about similar studies that resembled our remote sensing datasets or our models (i.e. stratified cross validation strategy). We included in this section of the discussion the best references that we were able to find.

We removed all of the figures that were included in the discussion to the supplementary material document.

Figures and Tables:

Table 4. What are the units for these measurements?

Response: Thank you for this indication. We have included the units in the table – grams (g)

Figure 9. Is this figure mentioned in the text?

Response: Thank you for this indication. We have reference Figure 9 in the text.

Figure 10. It would be useful here to insert the number of observations per species, as it is a crucial part in interpreting the uncertainty in the relationships due to uneven distributions between species.

Response: Thank you for this indication. We have completely modified Figure 10 and have included the number of observations per species in the species panels.

Tables 5 and 6. The ‘bold’ is not clearly visible in the manuscript, perhaps another measure could be used to represent the superior model if this is an important feature.

Response: Thank you for this indication. We have made the bold to also be underlined to denote the significance of the superior models.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have sufficiently addressed my comments. I am still not convinced that R² is appropriate. I understand that random forest is not based on a maximum likelihood and AIC/BIC do not apply. However, AUC could be used for comparing models. RMSE does not really add much else beyond R² as it pertains only to that data and that model. The addition of an external measure, like AUC would allow comparison of models. I do not believe R²/RMSE allow for that.

Author Response

Response to Reviewer 1 – Round 2

Reviewer 1 – Second round wrote:

“The authors have sufficiently addressed my comments. I am still not convinced that R² is appropriate. I understand that random forest is not based on a maximum likelihood and AIC/BIC do not apply. However, AUC could be used for comparing models. RMSE does not really add much else beyond R² as it pertains only to that data and that model. The addition of an external measure, like AUC would allow comparison of models. I do not believe R²/RMSE allow for that.”

Response: Thank you very much for your continuous support in providing recommendations to improve our manuscript. After reviewing the most recent literature about Receiver Operating Characteristic (ROC) for regression analyses, we found that a relatively new metric (2013) has been developed for the case of regression. This is the Regression Receiver Operating Characteristic (RROC). We have integrated this external measure for the comparison of model performance for the global models (i.e. LM-3D, LM-RGB, RF-RGB, LM-Multi, and RF-Multi). As this involved a brand-new calculation, we have added text in the methods, results and discussion sections as follows:

Methods: - The following text has been added: a new subsection 2.4.3. Comparison of the global models

“2.4.3. Comparison of the global models

The general performance of our SCV global models (LM-3D, LM-RGB, LM-Multi, RF-RGB, and RF-Multi) was assessed using traditional regression metrics such as RMSE, and MAE. In addition we calculated scores for the Regression Receiver Operating Characteristic (RROC) as proposed by [36] and implemented in the R package auditor [37]. Due to the nature of the SCV models; where each grass species was left out at each iteration, an extraction of the RROC on a per-species basis is not feasible and thus we conducted the RROC calculation for each global model.”

This involved adding two new references:

Hernández-Orallo, J. ROC Curves for Regression. Pattern Recognition 2013, 46, 3395–3411
Gosiewska, A.; Biecek, P. Auditor: An R Package for Model-Agnostic Visual Validation and Diagnostics 2020.

Results: - The following text has been added – a new subsection 3.3.6. Global models’ performance

“3.3.6. Global models’ performance

The RROC curves for the different regression models are shown (Fig. 15a). This plots shows for each model the magnitude of over-estimation (x-axis) and of under-estimation (y-axis) as defined by [36]. The base of the plot is a shift, which is equivalent to the threshold for traditional ROC curves. Where the shift equals 0 is represented by a dot. Shifts that are closer to the 0,0 origin (upper-left corner of the plot) are indicative of an overall better model performance. In Fig. 15a we can see that there is a gradual trend in model improvement, in incremental order: LM-3D, LM-RGB, LM-Multi, RF-RGB, and LM- Multi. This trend corresponds quite well with the performances shown for each species in figures 13 and 14. In addition, we present the scaled (values 0 to 1) scores for the performance metrics (RMSE, MAE, and RROC) in a model ranking radar plot (Fig. 15b). In this plot, values closer to 1 (one) indicate an overall better model performance. It becomes more evident that the non-parametric models (RF-RGB and RF-Multi) outperform the linear regression models in all three calculated metrics. “

And a brand-new figure was created and added to the manuscript: Figure 15 panels (a) and (b).” RROC curves with magnitudes of over and under estimations in predicted values (a) and scaled (0-1) regression performance metrics (b) for the five global SCV models.

Discussion: - The following text has been added at the end of section 4.4: Differences across model structures – how multispectral datasets improve model

“This was distinctly demonstrated with the addition of the Regression Receiver Operating Characteristic RROC curves, and the comparison of the scaled regression performance (RMSE, RROC, MAE) metrics for all the global models as depicted in Figure 15 (a and b). In both plots it was evident that the non-parametric global models (RF-RGB and RF-Multi) surpassed all the linear model variants. “

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have sufficiently revised the manuscript bases on the reviewer's comments.

Author Response

We appreciate a lot your comprehensive feedback to improve our manuscript. Thank you.

Article Menu

Using Unmanned Aerial Vehicles and Multispectral Sensors to Model Forage Yield for Grasses of Semiarid Landscapes

Further Information

Guidelines

MDPI Initiatives

Follow MDPI