Review Reports - Novel Grade Classification Tool with Lipidomics for Indica Rice Eating Quality Evaluation

Round 1

Reviewer 1 Report

Zhao and colleagues generated the lipidomics profiles of Indica rice samples belonging to different quality grades. Their goal was to use the lipidomics data as a means to develop a method for indica rice eating grade prediction.

I have some major concerns about the approach used for data analysis, and hence about the conclusions drawn by the authors. There lacks a description of why this type of analysis was chosen with respect to this specific dataset, what are the limitations, does this type of analysis bring any added value here, etc.

Here follow my observations and questions to the authors.

MAJOR CONCERNS

1) Use of OPLS-DA

A. The authors used a supervised model to compare 3 groups, the size of which is completely unbalanced. Of course there are lipidomics differences amongst the three grades, as PCA alone had already showed in Figure 1C. Since OPLS-DA follows from PCA, it is no wonder it will separate the groups. Now, the authors apply the supervised model to three groups that have the following size: n=6, n=7, n= 48. In this situation there inevitably is a bias. Did the author address it? I could not find any comment about this in the text. The impression is that OPLS-DA was used as the gold standard for lipidomics analysis, but maybe without considering whether it is appropriate to apply it to this specific situation. The authors should extensively explain the reasoning behind their choice, which are there advantages that justify this choice, which are the limitations.

B. The authors mention in lines 144-146 that they used student's t-test and ANOVA to reduce overfitting, false positive rates and screening potential biomarkers. The sentence seems to be incomplete. The authors should clearly state which technique they used to avoid overfitting. Alongside the results obtained from the test for overfitting, the authors should comment about the limited sample size (is it really possible to avoid overfitting with 6-7 samples in a group?) of 2 of the groups, which clearly constitute a limitation when applying a supervised model.

C. As to the statistical tests used, the authors should provide a brief explanation of why they were chosen. For example, why did the authors use parametric tests such as t-test and ANOVA? How did they check the distribution of the data? How did they check the homogeneity of variance?

2) Use of pathway enrichment analysis

A. In lines 149-150 of Materials and methods, the authors write they used KEGG and Metaboanalyst for the analysis of pathways. They do not provide any detail as to which procedure they followed to obtain the results they later describe. The parameters chosen to perform the analysis should be provided.

B. Although Metaboanalyst provides the possibility to do pathway enrichment analysis, when it comes to lipids the information on which such analysis is based is still very limited. As a matter of fact, for general lipidomics studies the level of information available for the analysis is the lipid class. It is a very broad level, i.e., whenever there are phospholipids in the dataset, then the Glycerophospholipid pathway will pop up, if the are glycerolipids, than the glycerolipid pathway will be affected, and so on and so forth. Thus, this type of analysis, to my knowledge, does not provide any additional meaningful information, unless the study was designed to address a specific pathway-related question. Could the authors elaborate on this and discuss the added value of the pathway analysis in this specific study?

3) Discussion

The authors further test their set of lipids using a different approach (random forest). After presenting the results they conclude that because the accuracy of the model was > 90% then the model is good and could be used to substitute traditional sensory evaluation (see abstract). I suggest the authors reconsider their statements and switch to a more descriptive approach. Whether it is a good model could be seen when applying it to a new and larger study (validation) - as of now the model was actually built on unbalanced classes.

It is clear that there are lipidomics differences among the three grades (see PCA). The authors may want to consider to remove the OPLS-DA entirely and rely on PCA and the lipids that are mostly contributing to the principal component that is responsible for separating the groups. The eating grades have different lipidomics profiles: how are they different? Are there changes already at the lipid class level? A class profile of the three groups could show if this is the case. A profile of the degree of unsaturation and chain length within the different classes could also provide a way to describe differences among eating grades. Another option, if they want to anyway present the OPLS-DA, is that they give an informed description of all the limitations of the model, that they provide all the quality checks needed alongside the results. To give a better overview of how the species change across the eating grades, maybe a heating map could help visualize the differential abundance.

I strongly suggest to remove the pathway analysis, unless it is based on another type of -omics data, or unless the authors can provide valuable information supporting it.

SOME ADDITIONAL POINTS TO ADDRESS

# line 171) I did not find supplementary figure S1, I could only download one pdf file containing the manuscript with the main figures

# line 183) Same as above, I could not find Supplementary Table S1

# Figure 1) panel C shows the PCA of the dataset. It is not clear to me which principal components are used to draw the plot. Should it not be T1 vs T2 score values? Why are the axes reporting T score 1 vs Orthogonal T score 1? I suggest the authors either change the names of the axes and use more common terms, or explain in the figure's caption the meaning of what they are showing so that any reader can easily understand what the plot shows. Additionally, adding to the caption a brief description of the eating quality grades, what they correspond to, would make the figure clearer, so that there is no need to go back to the text to see what they correspond to.

# Figure 1) panel D The description of the plot in the caption is not very clear. The plot is a pie chart that represent the % coverage (Is it Mol%?) of the lipid classes detected in the samples. Additionally, I suggest that this class profile should be represented by means of a boxplot or dot plot. The pie chart does nor give a clear overview of the data, it is not a straightforward way to represent the data.

# general observation - I could not find the total number of lipids that were detected. The authors could provide in the results a brief overview of how many lipids were detected spanning how many and which classes. They could also provide as supplementary a table with the mean molar amount in each eating grade of every lipid class.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

The manuscript with ID foods-2088628 presents an attempt for distinguishing between three eating grades (low, middle and high, estimated by sensory test) of 51 rice samples from different varieties and regions in China. A modern chromatographic technique as UPLC-QTOF/MS was used for analysis of various lipid compounds. Then chemometrics data processing was applied for analysis of differences between samples. Finally, discriminant analysis was used to create a model for identification of lipid components important for samples classification. The work is interesting and would be useful for other researchers although additional experiments are needed concerning additional rice varieties grown at other environmental conditions.

Some corrections should be done, as follows:

Lines 93-94: the sentence is not clear

Lines 93 and 97-98: the term “reagent” is incorrect here, should be replaced

Line 95: Why the mixture was put at 4^oC for 10 min?

Lines 146 and 149: use capital letters for Students` t-test and Kyoto Encyclopedia

Line 154: not clear why the number of samples is 109

Lines 154 and 167: not clear which are these 61 samples

Figure 1 A and B: the formulae are too small

Line 200 and Table 2: Why PE (22:0) was not taken into account together with PC (18:1/18:2) and PA (16:0) since its content significantly decreased in the order high>medium>low rice grade?

Line 228: Fatty acids in free or esterified form?

Lines 235 and 238: the reference “Yoshida…2011” should be written the same way

Lines 250 and 280 : Which of 35 or 36 reference is cited here?

Line 289: Should be Conclusions instead of 4. Discussion.

List of References:

- - the journal titles should be written in the same style

- - references should be cited in the text by their numbers

- - there is no reference with number 41.

Editing of English language is required.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

I have no further comments

Author Response

Thank you very much