Next Article in Journal
Relationship between Accuracy, Speed, and Consistency in a Modern Pentathlon Shooting Event
Previous Article in Journal
Detrending Moving Average, Power Spectral Density, and Coherence: Three EEG-Based Methods to Assess Emotion Irradiation during Facial Perception
 
 
Article
Peer-Review Record

Averaging and Stacking Partial Least Squares Regression Models to Predict the Chemical Compositions and the Nutritive Values of Forages from Spectral Near Infrared Data

Appl. Sci. 2022, 12(15), 7850; https://doi.org/10.3390/app12157850
by Mathieu Lesnoff 1,2,3, Donato Andueza 4, Charlène Barotin 5, Philippe Barre 5, Laurent Bonnal 1,2, Juan Antonio Fernández Pierna 6, Fabienne Picard 4, Philippe Vermeulen 6 and Jean-Michel Roger 3,7,*
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2022, 12(15), 7850; https://doi.org/10.3390/app12157850
Submission received: 13 July 2022 / Revised: 29 July 2022 / Accepted: 2 August 2022 / Published: 4 August 2022
(This article belongs to the Section Environmental Sciences)

Round 1

Reviewer 1 Report

see attached file

Comments for author File: Comments.pdf

Author Response

  • “In the Theory section, the authors outline the complex mathematical notation needed to define the averaging and stacking models. But I feel that equations (2) and (4) either have errors in them or more likely ….”

 

You are completely right, there were typing errors in formulas. Many thanks for noting this! We have corrected Sections 2.2.2 and 2.2.4.

 

  • “In Materials and Methods, I would suggest the authors use colored (but unfilled) open circles in Figure 3. The reader can’t really see the extent of the distributions in the bottom layers. One could also plot 95% confidence ellipses.”

 

We used open circles in the new figure. We also used smaller marker size as proposed by reviewer 2. Both suggestions were good ideas.

 

  • “In the Results section, it might make more sense to report the Median instead of the Mean, as a is a positive integer.”

 

We agree that, for a positive discrete random variable, reporting the median can be more informative than the mean but particularly when the distribution is highly right skewed with many values equal or close to >0. This was not our case, and we preferred keep reporting the mean, as it is usually done (for instance) for a Poisson distribution.

Thanks for the remark.

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper focuses on an alternative strategy that the averaging methods compared to the usual PLSR. The averaging methods presented in this paper can be easily embedded in pipe-lines of local PLSR. The topic is interesting. I only have few concerns.

Authors used forages data to test the effectiveness of this method, how about other foods or agricultural products?

Abstract should be described in brief.

Introduction, Several in-labin-site or practical applications using NIR or NIR-HSI in combination with the usual PLSR should be mentioned for introduction. For examples, “Wang et al., 2022, https://doi.org/10.1016/j.compag.2022.106843”Chu et al., 2022, https://doi.org/10.1016/j.infrared.2022.104098.

Results, Table 2 &3, how about the abbreviations of words in Y column?

Figure 3, the points in PC scattering plot are too large to observe the clustering tend.

 

Author Response

  • “Authors used forages data to test the effectiveness of this method, how about other foods or agricultural products?”

 

(a) Our paper focused on forages data for two main reasons. The first reason is that forages are a priority material studied in our applied research teams and, therefore, we had access to significant data in numbers and representativity (therefore we think that it provides strong evidences of genericity for forages). The second is that, based on our few observations on other types of data, the ensemble methods that we have presented are particularly efficient when the material contain large heterogeneity due to scattering effects in the signals, origins, species, survey staffs, spectrometers, etc. The heterogeneity point is already indicated in the Introduction.

(b) Even if we had no access to other data than forages for this article, we strongly believe that our ensemble methods are also efficient for foods and other agricultural products as you relevantly suggested, when they will contain heterogeneity. We have presented our generic and easy-to-implement methods and tools in the article as a prove of concept, we hope that readers will test these methods on many other types of materials than forages and will confirm our guess. We have added a paragraph on that point in the Discussion.

 

  • “Abstract should be described in brief.”

 

We decreased abstract size to 200 words (the maximal limit accepted by the journal).

 

  • “Introduction, Several in-labin-site or practical applications using NIR or NIR-HSI in combination with the usual PLSR should be mentioned for introduction. For examples, Wang et al., 2022, https://doi.org/10.1016/j.compag.2022.106843”、“Chu et al., 2022, https://doi.org/10.1016/j.infrared.2022.104098”.”

 

We have added the two studies in the Introduction. Thanks for these references.

 

  • “Results, Table 2 &3, how about the abbreviations of words in Y column?”

 

The first row (titles of the column) of Table 2 (Table defining the variables y and their abbreviations) has made clearer.

 

  • “Figure 3, the points in PC scattering plot are too large to observe the clustering tend.”

 

This was done, thanks.

Author Response File: Author Response.pdf

Reviewer 3 Report

1. All spectral data were treated with SNV and derivation. Is it the best spectral treatment? How did you select the proper spectral pre-processing? In NIR, the spectral data is highly correlated, and spectral pre-processing is the main step before any PLSR calculation.  

2. Figure 3 is not clear. Did you calculate the PCA based on original or pre-processed spectral data? Please mention also the explained variance for each PC. I did not find an important reason to show the PCA score plot in your manuscript.

3. In the usual PLSR, the estimation of RMSECV is usually calculated using a full-cross validation method. What do you think about this?

Author Response

  • “All spectral data were treated with SNV and derivation. Is it the best spectral treatment? How did you select the proper spectral pre-processing? In NIR, the spectral data is highly correlated, and spectral pre-processing is the main step before any PLSR calculation.”

 

This is a relevant remark and we fully agree that the pre-processing is an important step before modeling NIR data with PLSR (or other types of models). Nevertheless, the main focus of the article was to compare prediction methods *after* pre-processing, not to search the optimal pre-processing (which should be done, in practice, separately for each dataset {X, y}). Instead, we used a common pre-processing (SNV + 2nd derivation) that is known to give very acceptable results for dried and grounded vegetal materials (e.g. Lesnoff et al. 2020 in the reference list and many observations on forages, wheat powders etc.). Moreover, we observed that the patterns of our comparisons were consistent for other pre-processing. In our opinion, reporting results with many other pre-processing would have made the paper much more complicated to read, and goes beyond the objective of this paper.

In this meaning, we slightly modified the text of Section 3.1.

 

  • “Figure 3 is not clear. Did you calculate the PCA based on original or pre-processed spectral data? Please mention also the explained variance for each PC. I did not find an important reason to show the PCA score plot in your manuscript.”

 

The main objective of Fig3 (projection of a PCA on the pre-processed data) was to emphasize that our 6 forage datasets were quite different (they did not represent the “same forage type”) that gives, in our opinion, some interesting genericity to our study. In the new version, we tried to clarify the legend of Fig3.

 

  • “In the usual PLSR, the estimation of RMSECV is usually calculated using a full-cross validation method. What do you think about this?

 

Actually, you surely know that many methods of RMSEP estimation are available, within or outside the CV frame. If by “full-CV” you mean for instance a K-Fold CV process (if we well understood your point), we agree that is quite usual in the chemometrics community. Nevertheless, it also depends on the chemometricians; many of them also use the replicated “test-set” CV strategy (= the one that we used). This last is in fact very close to the bootstrap strategy (the only difference is that bootstrap makes sampling with replacement).

It also depends on the research community. For instance, at our knowledge, the replicated (or even more frequently non-replicated) test-set CV is intensively used in the “machine learning” community.

Anyway, if you test on your side (if not already done), you will see that replicated test-set CV and the replicated K-Fold CV give very close results (the small quantitative differences have no practical importance).

Alternatives to CV are also the parsimony criteria (Cp, AIC etc.) that also usually give very close results to CV (for instance Lesnoff et al. 2021).

 

 

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

I would like to recommend to accept it.

Back to TopTop