Review Reports - A Model for Fat Content Detection in Walnuts Based on Near-Infrared Spectroscopy

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The manuscript is concerned with the determination of fat conent in walnuts by combination of IR spectrometry and machine learning algorithms. The task was solved succussfully, the authors have proved the validity of the proposed approach.

The manuscript can be published after minor corrections.

1. Abbreviation BP should be expanded (back propagation).

2. Page 2 line 50 – consumption time – probably, analysis time.

3. Line 52 – NIR spectroscopy is hardly a new technique (has been used for decades).

4. References 3 and 4 should be also mentioned between lines 56-75.

5. Term inversion should be explained in lines 79.

6. What is RPD? Residual prediction deviation (line 160) or residual prediction bias (line 164)? All is the same, but please be consistent.

7. RMSEC = root mean squared error of calibration, RMSEP = root mean squared error of prediction – these are more common terms to use (lines 164-165).

8. In formulae 1-2 through 1-4, no multiplication xy is supposed to exist (rather, y predicted minus y observed). Please check correctness of the formulae.

9. In section 2.3.5, the criteria for determining outliers should be clearly stated.

10. In section 3.1 (line 201), the areas of 1070…1680 cm-1 are discussed, which are not shown in the graph (Fig. 1).

11. Line 225 – extra words (The frequency and).

12. Line 308 – a comparison of the coefficient of determination with those from refs. 3 and 4 would be appropriate here.

Comments on the Quality of English Language

English mostly ok.

Author Response

Comments 1: [ Abbreviation BP should be expanded (back propagation).]

Response 1: [The acronym BP has been extended] Thank you for pointing this out. I/We agree with this comment. Therefore, I/we have….[in lines29/31/36/84/159/160/220/279/283/289/318.]

“[updated text in the manuscript if necessary]”

Comments 2: [Page 2 line 50 – consumption time – probably, analysis time.]

Response 2: Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [It's been modified – page 2 line53.]

“[updated text in the manuscript if necessary]”

Comments 3: [ Line 52 – NIR spectroscopy is hardly a new technique (has been used for decades).]

Response 3: Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [It's been modified – page 2 line55.]

Comments 4: [References 3 and 4 should be also mentioned between lines 56-75.]

Response 4: Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [It's been modified – page 2 line57.]

Comments 5: [ Term inversion should be explained in lines 79.]

Response 5: Whether consideration of terminology inversion is unnecessary, as they all mean the same thing.

Comments 6: [ What is RPD? Residual prediction deviation (line 160) or residual prediction bias (line 164)? All is the same, but please be consistent.]

Response 6: Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [It's been modified – page 4 line166/167/172.]

Comments 7: [ RMSEC = root mean squared error of calibration, RMSEP = root mean squared error of prediction – these are more common terms to use (lines 164-165).]

Response 7: The literature I have read uses these common terms to evaluate the stability of models.These are the evaluation parameters of the model.

Comments 8: [In formulae 1-2 through 1-4, no multiplication xy is supposed to exist (rather, y predicted minus y observed). Please check correctness of the formulae.]

Response 8: Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [It's been modified – page 4 line179-185.]

Comments 9: [ In section 2.3.5, the criteria for determining outliers should be clearly stated.]

Response 9: Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [It's been modified – page 5 line190-197.]

Comments 10: [In section 3.1 (line 201), the areas of 1070…1680 cm-1 are discussed, which are not shown in the graph (Fig. 1).]

Response10: Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [The units of 1070...1680 in the passage should be nm, which is wrongly written as cm-1, and 1070nm equals 9345cm-¹ – page 5 line216.]

Comments 11: [ Line 225 – extra words (The frequency and).]

Response 11: No additional words found.

Comments 12: [ Line 308 – a comparison of the coefficient of determination with those from refs. 3 and 4 would be appropriate here.]

Response12: Personally, I don't feel the need to compare. Because the final result is predictable.

Reviewer 2 Report

Comments and Suggestions for Authors

- The algorithms need to be better described. ANN, SVM and CARS have a lot of parameters and none are discussed

- Abstract – too many decimals are reported for the r2, RMSE and RPD

- Line 45 – become rich – should be deleted

- Line 52 – NIR is not new. “new” should be deleted

- Lines 52-55 – NIR requires the use of organic reagents to generate the models so it is not true that NIRE does not use organic solvents

- Section 2.2 is a list. It should be changed into a sentence

- Section 22.3.3 should be 2.3.3

- In section 2.3.3, MSC, SNV, FD, … references should be added for these preprocessing methods.

- Line 149 – SVN is standard Normal Variate, not standard normal distribution transformation

- line 164 – “correct the root mean square error” should be replaced by root mean square error of calibration

- line 165 - “correct the root mean square error” should be by root mean square error of prediction

- in the equations, what is xy? If it is y predicted, you the y hat nomenclature

- line 182 – “so the remaining 174 sample data were excluded” – I presume it is were included

- section 2.3.6 – add references for RS and SPXY

- line 207 – “management levels” – what does that mean?

- Line 3.2 – provide the preprocessing parameters doe FD and SD

- Line 225 – “The frequency and Figure 3 …” – check the sentence. Something is missing

- Line 226 – it is the first time CARS is discussed. Introduce it before and provide its parameters

- Figures 4 and 5 – why are they not showing the same samples? I thought the same dataset was used on both methods

Author Response

Comments 1: [The algorithms need to be better described. ANN, SVM and CARS have a lot of parameters and none are discussed.]

Response 1: Thank you for pointing this out. In the literature I read on the subject, most of the parameters of the method were not taken up for discussion, but only mentioned in the article, and the discussion part was supposed to be for the discussion of the results I got versus the results of others, or the comparison of the method I used versus the method used by the previous person, so I'll just borrow from the others who didn't have a discussion on the parameters of the method.

Comments 2: [Abstract – too many decimals are reported for the r2, RMSE and RPD]

Response 2: r2, RMSE and RPD these are the parameters for the evaluation of the model, the presence of decimals gives the reader a clearer idea whether the modeled stability is feasible or not, so it is necessary to list them one by one.

“[updated text in the manuscript if necessary]”

Comments 3: [ Line 45 – become rich – should be deleted.]

Response 3: Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [The”become rich” has been deleted by me]

Comments 4: [ Line 52 – NIR is not new. “new” should be deleted]

Response 4: Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [The”new” has been deleted by me.]

Comments 5: [ Lines 52-55 – NIR requires the use of organic reagents to generate the models so it is not true that NIRE does not use organic solvents.]

Response 5: Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [I've reworded the sentence. – page 2 line56/57/58.]

Comments 6: [Section 2.2 is a list. It should be changed into a sentence]

Response 6: Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [I've changed it to a sentence– page 3 line101-104.]

Comments 7: [Section 22.3.3 should be 2.3.3.]

Response 7: [The redundant numbers have been removed- page 4 line143.]

Comments 8: [In section 2.3.3, MSC, SNV, FD, … references should be added for these preprocessing methods.]

Response 8: Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [I've added the corresponding references– page 4 line155.]

Comments 9: [ Line 149 – SVN is standard Normal Variate, not standard normal distribution transformation.]

Response 9: Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [It's been modified – page 4 line153.]

Comments 10: [line 164 – “correct the root mean square error” should be replaced by root mean square error of calibration.]

Response10: Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [It's been modified – page 4 line163.]

Comments 11: [ line 165 - “correct the root mean square error” should be by root mean square error of prediction.]

Response 11: [It's been modified – page 4 line167.]

Comments 12: [ in the equations, what is xy? If it is y predicted, you the y hat nomenclature.]

Response12: [It's been modified – page 4 line176-180.]

Comments 13: [ line 182 – “so the remaining 174 sample data were excluded” – I presume it is were included.]

Response13: Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [It's been modified – page 4 line187.]

Comments 14: [ section 2.3.6 – add references for RS and SPXY.]

Response 14: [It's been modified – page 5 line197.]

Comments 15: [ line 207 – “management levels” – what does that mean?]

Response15: [Management of the orchard varies.]

Comments 16: [ Line 3.2 – provide the preprocessing parameters doe FD and SD.]

Response16: Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [The window sizes for FD and SD are 0.08 and 0.15, respectively, and the number of smoothing points is 9 and 7, respectively.]

Comments 17: [ Line 225 – “The frequency and Figure 3 …” – check the sentence. Something is missing.]

Response 17: Agree.I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [It's been modified.]

Comments 18: [Line 226 – it is the first time CARS is discussed. Introduce it before and provide its parameters]

Response18:Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point.[It's been modified – page6 line244-246.]

Comments 19: [Figures 4 and 5 – why are they not showing the same samples? I thought the same dataset was used on both methods.]

Response 19: [The sample size for validation is the same.In order to better distinguish is the validation results of the two modeling methods, so different derivations are used for graphing. The two modeling methods are not the same and the stability of the model is not the same, Figure 4 is the validation result of BP neural network and Figure 5 is the validation result of SVR.]

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Answer to comment 1 response: Adding SVN, ANN and CARS parameters is about ensuring reproducibility of the results. It is important for other researchers to talk about which SVN algorithm was used, what software platform, what kernel was used and how the optimization of the hyperplane parameters were tuned. For ANN, it is important to state what was the structure of the network, the activation functions and the inputs (spectra of PCA scores?) and the software used. Similar for CARS. You can just say I used these methods. You need to say how you used them

Answer to comment 2 response: Using 4 decimals is not needed and does not provide any information regarding “whether the modeled stability is feasible or not”. For instance, 0.8589 and 0.8873 can be changed to 0.86 and 0.89 and keep the exact same meaning. If authors insist in keeping 4 decimals, then they need to demonstrate that it is important. If the modeling error is important to the 0.0001 level, then sure but here, your errors are 1.56 and 1.58 so showing 4 decimals is not relevant

Line 51 – “analysising” replace by analysis

Line 102 – “and so on” – what does that mean here? If you used additional equipment, it must be cited

Answer to comment 16 response: "[The window sizes for FD and SD are 0.08 and 0.15, respectively, and the number of smoothing points is 9 and 7, respectively.]” – How can you have derivative window sizes of 0.08 or 0.15? Which algorithm is used?

Line 232 – “The frequency and CARS is a feature variable” – there is still a problem here. There seem to have 2 differences sentences combined here.

Answer to comment 19 response: Your answer is not clear. The 2 plots show different distribution of the reference values, thus indicating that the 2 validation sets are different. There are 2 options here: 1) the validation sample size is the same, but the samples are different or 2) the validation sample size is the same and the samples are the same. Which one is the version you used? Per the figures, it seems that you used option 1 (range from ~52% to ~76% for ANN while the range is ~54% to ~72% for SVM). For a rigorous comparison of ANN and SVM, option 2 should be selected. But if you selected option 1 (as your answer seem to indicate), then you need to make it very clear in the text that the results of SVM and ANN are not directly comparable since the calibration and validation sets are no identical.

Author Response

Comments 2: [Using 4 decimals is not needed and does not provide any information regarding “whether the modeled stability is feasible or not”. For instance, 0.8589 and 0.8873 can be changed to 0.86 and 0.89 and keep the exact same meaning. If authors insist in keeping 4 decimals, then they need to demonstrate that it is important. If the modeling error is important to the 0.0001 level, then sure but here, your errors are 1.56 and 1.58 so showing 4 decimals is not relevant.]

Response 2: Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [All references to 4 as a decimal in the text have been changed to 2 as a decimal.]

“[updated text in the manuscript if necessary]”

Comments 3: [ Line 51 – “analysising” replace by analysis.]

Response 3: Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [This has been corrected.]

Comments 4: [ Line 102 – “and so on” – what does that mean here? If you used additional equipment, it must be cited.]

Response 4: Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [This has been corrected.]

Comments 5: [ "[The window sizes for FD and SD are 0.08 and 0.15, respectively, and the number of smoothing points is 9 and 7, respectively.]” – How can you have derivative window sizes of 0.08 or 0.15? Which algorithm is used?.]

Response 5: Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [I'm really sorry, I misread it when I looked at it, the window size for both FD and SD is 11.]

Comments 6: [Line 232 –“The frequency and CARS is a feature variable” – there is still a problem here. There seem to have 2 differences sentences combined here.]

Response 6: Sorry, there is no such sentence in line 232, and CARS is only mentioned in line 265, but there is no such sentence.

Comments 7: [Your answer is not clear. The 2 plots show different distribution of the reference values, thus indicating that the 2 validation sets are different. There are 2 options here: 1) the validation sample size is the same, but the samples are different or 2) the validation sample size is the same and the samples are the same. Which one is the version you used? Per the figures, it seems that you used option 1 (range from ~52% to ~76% for ANN while the range is ~54% to ~72% for SVM). For a rigorous comparison of ANN and SVM, option 2 should be selected. But if you selected option 1 (as your answer seem to indicate), then you need to make it very clear in the text that the results of SVM and ANN are not directly comparable since the calibration and validation sets are no identical..]

Response 7:Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [There has been no comparison of the two models.]

Round 3

Reviewer 2 Report

Comments and Suggestions for Authors

Line 246, the sentense is still not correct: "The frequency an CARS is a feature [...]". See the attached file if you can't find it.

Comments for author File: Comments.pdf

Author Response

Comments 1: [Line 246, the sentense is still not correct: "The frequency an CARS is a feature [...]". See the attached file if you can't find it.]

Response 1: Agree. I/We have, accordingly, done/revised/changed/modified…..to emphasize this point. [It's been modified – page 7 line266.]