Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Predicting Pineapple Quality from Hyperspectral Data of Plant Parts Applied to Machine Learning

AgriEngineering 2025, 7(6), 170; https://doi.org/10.3390/agriengineering7060170

by Vitória Carolina Dantas Alves¹, Sebastião Ferreira de Lima¹

, Dthenifer Cordeiro Santana¹, Rafael Ferreira Barreto¹

, Roger Augusto da Cunha¹, Ana Carina da Silva Cândido Seron¹, Larissa Pereira Ribeiro Teodoro¹

, Paulo Eduardo Teodoro¹

, Rita de Cássia Félix Alvarez¹, Cid Naudi Silva Campos¹

, Carlos Antonio da Silva Junior²

and Fábio Luíz Checchio Mingotte^3,*

Reviewer 1:

Pavel A. Dmitriev

Reviewer 2:

Tianying Yan

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Reviewer 5: Anonymous

AgriEngineering 2025, 7(6), 170; https://doi.org/10.3390/agriengineering7060170

Submission received: 3 February 2025 / Revised: 30 April 2025 / Accepted: 14 May 2025 / Published: 3 June 2025

(This article belongs to the Special Issue Transforming Agriculture with Artificial Intelligence: Recent Advances and Applications)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

General comments:

The authors have carried out a study aimed at solving an important applied problem. At the same time, the manuscript describes the methodology of data preprocessing, processing and results obtained very poorly, which does not allow their expert evaluation.

The Abstract section of the manuscript needs to be revised. It should be noted that it is not necessary for the abstract to describe the methodology in such detail. It is necessary that this section contains brief information about the relevance, purpose, methods and results of the study.

Specific comments:

Line 2. Add two more keywords.

Line 35. Is it Graphical Abstract? What principle was used to divide the methods into 3 groups?

Line 38. ‘Ananas comosus’ – the author of the taxon should be given.

Line 65. More references should be cited, there are many papers on determining fruit quality using hyperspectral imaging.

Line 66. ‘this method is very accurate’ – that's a controversial statement.

Line 95. Perhaps it should be stated not ‘during the experiment’ but ‘during the period of the study’.

Line 101. ‘Results of soil chemical analysis’.

Line 137. How many measurements were taken on each plant?

Line 151. Please give and describe the scheme of neural network. Specify hyperparameters for machine learning algorithms.

Line 153. It is necessary to give relative errors and R².

Line 154. The manuscript does not present the results of analysis of variance. Please provide a scheme of analysis of variance.

Line 158. How was the correlation coefficient calculated? How many data points were involved? Describe the calculation procedure. How was the DT method applied? What was the input, what was the output?

Line 165. ‘better results for SVM with a mean prediction of 0.43 correlation coefficient’ – it's a very weak correlation.

Line 188. Please present the results of the modelling in the form of regression with the relative error (RMSE) and coefficient of determination (R²).

Line 250. What exactly are you talking about, write in one term what you determined by liquid chromatography?

Line 418. Section References. DOI for all references must be provided.

Author Response

Reviewer 1:

General comments:

C: The authors have carried out a study aimed at solving an important applied problem. At the same time, the manuscript describes the methodology of data preprocessing, processing and results obtained very poorly, which does not allow their expert evaluation.

A: We appreciate your consideration. The explanation about data pre-processing is not presented because there was none. To simplify the data analysis process, the information provided by the sensor was tabulated and processed by the software mentioned in the work. However, in order to improve the clarity of this section of the work, more information was added.

C: The Abstract section of the manuscript needs to be revised. It should be noted that it is not necessary for the abstract to describe the methodology in such detail. It is necessary that this section contains brief information about the relevance, purpose, methods and results of the study.

A: We appreciate the suggestion, the methodology was less detailed in the abstract and more information about relevance, purpose and results of the study was added to the abstract.

Specific comments:

C: Line 2. Add two more keywords.

A: We added it. Please see lines 33 and 34. Thank you.

C: Line 35. Is it Graphical Abstract? What principle was used to divide the methods into 3 groups?

A: Thank you for the suggestion. Yes, this is a Graphical Abstract. The division of the methods into three groups was based on the principle that leaf reflectance has been widely used in other crops to associate characteristics of final interest, such as seed germination or industrial attributes of soybeans (https://doi.org/10.1016/j.saa.2024.123963, https://doi.org/10.1016/j.infrared.2024.105326, https://doi.org/10.3390/agriengineering6040272) . Based on this logic, the authors initially considered that leaf reflectance could be a tool to predict, before harvest, the desired characteristics of pineapple. However, when exploring other possibilities — which demonstrated better performance — it was realized that the reflectance of the skin and fruit could provide even more relevant results.

C: Line 38. ‘Ananas comosus’ – the author of the taxon should be given.

A: Thank you for the correction, the requested change has been made.

C: Line 65. More references should be cited, there are many papers on determining fruit quality using hyperspectral imaging.

A: We appreciate the suggestion, other articles on the subject have been added, especially in the discussion

C: Line 66. ‘this method is very accurate’ – that's a controversial statement.

A: Thank you for the suggestion, the term has been removed to avoid confusion and be clearer.

C: Line 95. Perhaps it should be stated not ‘during the experiment’ but ‘during the period of the study’.

A: Thank you for the suggestion, the term has been changed.

C: Line 101. ‘Results of soil chemical analysis’.

A: Thank you for the suggestion, the term has been changed

C: Line 137. How many measurements were taken on each plant?

A: Thank you for your question, 3 hyperspectral readings were taken from each leaf, bark and fruit of each plant and the average of these measurements was calculated.

C: Line 151. Please give and describe the scheme of neural network. Specify hyperparameters for machine learning algorithms.

A: Thanks for the suggestion, Information about the hyperparameters of the models used has been added

C: Line 153. It is necessary to give relative errors and R2.

A: The use of the correlation coefficient (r) and the mean absolute error (MAE) as model evaluation metrics is sufficient and appropriate for this study. The correlation coefficient (r) measures the strength of the linear relationship between predicted and observed values, allowing a direct analysis of the association between variables. Unlike R2, which represents the proportion of variance explained by the model, r is a more intuitive metric to assess the quality of predictions, especially in spectral studies, where relationships can be complex. The mean absolute error (MAE) provides a clear measure of model accuracy, as it expresses the average error of predictions in the same unit as the response variable. Unlike RMSE, which penalizes extreme outliers, MAE allows a balanced assessment of model accuracy without overvaluing specific errors, which is advantageous for spectral data. Furthermore, R2 can be influenced by patterns of variance in the data and may not adequately represent model quality when the relationship between variables is not strictly linear. RMSE, in turn, emphasizes large individual errors, which can distort the perception of predictive performance in a context where small variations are already expected.

C: Line 154. The manuscript does not present the results of analysis of variance. Please provide a scheme of analysis of variance.

A: Thank you for the suggestion, the analysis of variance has been added.

C: Line 158. How was the correlation coefficient calculated? How many data points were involved? Describe the calculation procedure. How was the DT method applied? What was the input, what was the output?

A: The correlation coefficient was automatically calculated by the software used in the analysis. This coefficient is obtained through the statistical relationship between the predictor variables and the response variable, reflecting the strength and direction of the association between them. The calculation follows the standard Pearson correlation formula, which measures the degree of linearity between the data;100 spectral samples of leaves, bark and fruit were analyzed, with 2,200 pieces of information coming from each reading. All calculation procedures were conducted by the software itself, in which we selected the machine learning models, and it provides us with performance metrics, such as the mean absolute error (MAE) and the correlation coefficient; the input of the models was applied using spectral data as input variables, while the output variable represented information related to the quality of the fruits. This information has been added more clearly in the text

C: Line 165. ‘better results for SVM with a mean prediction of 0.43 correlation coefficient’ – it's a very weak correlation.

A: We appreciate the question regarding the correlation coefficient of 0.43 obtained by the SVM, which can be interpreted as a weak correlation. However, when dealing with complex spectral data, this value can be considered significant due to the nature of the data and the complex predictive modeling in this context, due to the size of the database and the complexity of the spectral relationship with fruit quality traits. It is important to highlight that the complexity of spectral data often results in lower correlation coefficients. This is due to the high dimensionality that can make it difficult to identify clear patterns and negatively impact the accuracy of predictive models. Therefore, when evaluating the results of models applied to complex spectral data, it is essential to consider these factors and interpret the correlation coefficients in the context of the difficulty associated with the analysis of such data. Even correlation coefficients that seem low can actually represent relevant predictive performance, given the limitations and challenges present in spectral data modeling.

C: Line 188. Please present the results of the modelling in the form of regression with the relative error (RMSE) and coefficient of determination (R2).

C: Line 250. What exactly are you talking about, write in one term what you determined by liquid chromatography?

A: Liquid chromatography is an analytical technique used to separate, identify and quantify components of a mixture. It works based on the differential interaction of compounds with two phases: the mobile phase, in which a liquid (solvent or mixture of solvents) carries the sample through the system, and the stationary phase, which is a solid or liquid material fixed inside a column, where the compounds are separated. As the mobile phase passes through the stationary phase, the different compounds in the mixture interact differently with the two media, being slowed down at different speeds. This process results in the separation of the components, allowing their detection and analysis. One of the most advanced forms of this technique is ultra-performance liquid chromatography (UPLC), widely used in laboratories to analyze substances such as drugs, natural compounds, proteins and metabolites.

C: Line 418. Section References. DOI for all references must be provided.

A: Thank you for the suggestion, the DOI of all articles have been added.

Reviewer 2 Report

Comments and Suggestions for Authors

Summary:

The study investigates the use of machine learning (ML) for pineapple quality prediction using hyperspectral data under controlled conditions. However, some issues need further exploration as follows.

Major issues:

Whether the hyperparameters of machine learning algorithms are optimized for fair comparison is unknown, and the use of Auto-WEKA tools should be declared. ANN should also adjust the neuron numbers to ensure optimal performance.

Please adjust the layout of the figures and title correctly. In addition, the title should adequately state that A and B are r scores and MAE, respectively, to determine independent comprehensibility. Sometimes the tables can highlight the difference in ML performances in numbers better than the figures in the Results.

Minor issues:

In the Introduction, the phrase ‘IA-based algorithms’is misspelled in the penultimate sentence. Careful manuscript reading combined with the assistance of a professional editor or native speaker of English is recommended.

Author Response

Reviewer 2:

C: Whether the hyperparameters of machine learning algorithms are optimized for fair comparison is unknown, and the use of Auto-WEKA tools should be declared. ANN should also adjust the neuron numbers to ensure optimal performance.

A: We appreciate the suggestion. The hyperparameters of each model were detailed in the text. In the case of neural networks, we used 10 neurons in each of the two layers, as indicated in previous studies (https://doi.org/10.3390/rs15245657, https://doi.org/10.3390/agriengineering6010020, https://doi.org/10.3390/agriengineering6040255), since Weka defines this configuration based on the number of variables in the database. Since hyperspectral data has more than 2000 variables, a larger number of neurons would make processing unfeasible. As for Auto-Weka, the authors chose to use models already established with the software's default hyperparameters, following the approach adopted in other studies (https://doi.org/10.3390/f15010039, https://doi.org/10.1016/j.saa.2024.124113, https://doi.org/10.3390/a17010023).

C: Please adjust the layout of the figures and title correctly. In addition, the title should adequately state that A and B are r scores and MAE, respectively, to determine independent comprehensibility. Sometimes the tables can highlight the difference in ML performances in numbers better than the figures in the Results.

A: Thank you for the suggestion, The figures have been replaced by tables

C: In the Introduction, the phrase ‘IA-based algorithms’is misspelled in the penultimate sentence. Careful manuscript reading combined with the assistance of a professional editor or native speaker of English is recommended.

A: We appreciate the suggestion, careful reading of the manuscript along with assistance from a professional editor.

Reviewer 3 Report

Comments and Suggestions for Authors

This work aimed to analyze the hyperspectral image data for pineapple products and run AI models to predict quality metrics. Overall, I do not think this work can be published.

-- 1. No research novelty. The author claimed that ''However, the prediction of pineapple quality by hyperspectral data applied to ML is not known." in abstract, while it took me no time to find a lot of papers published on such topics:

Accurate ripening stage classification of pineapple based on visible and near-infrared hyperspectral imaging system [https://doi.org/10.1093/jaoacint/qsaf010]

Towards a Multispectral Imaging System for Spatial Mapping of Chemical Composition in Fresh-Cut Pineapple (Ananas comosus) [https://doi.org/10.3390/foods12173243]

-- 2. totally irrelevant information in section 2.1

-- 3. A technical question: the models mentioned in this work, can only take 2-dimension inputs, while as known hyperspectral image data are stored as data cube, in 3-dimension, how the hsi data were feed to the models?

-- 4. no details regarding the models used, at all. Even for the most based ANN model there are a lot of hyperparameters (number of layers? number of neurons in each layer? optimizer? learning rate?) that need to be determined when using the model, without any of such information provided, the results cannot be considered as reliable.

-- 5. Figures are not fixed in the pdf file, making them unable for reading.

-- 6. The ultimate goal of this work is unknown. Based on the abstract and introduction, my original understanding was that this paper was trying to figure out a good model, but in the results part, the authors seemed to optimize the model result by using different parts of the pineapples. Which makes little to no sense regarding model design.

-- 7. model performance was not reported, training/validation/testing loss, etc

-- 8. the discussion section is not explaining the mechanics standing behind the results at all. e.g. line 374-377, ''Pineapple fruits have a high content of various flavonoids, that are directly linked to the color of the fruit and are involved in the white-yellow pigmentation of the fruit. It has an antioxidant function and cellular signaling pathways and plays na essencial part of plant growth and biotic and abiotic stress reduction [41]." I cannot see any necessity of discussing the relationship between the flavonoids and the yellowish color in this manuscript. Please be concise, using of relevant information.

Comments on the Quality of English Language

Please seriously review the text before submitting a research article to avoid unnecessary typos, like line 19 [experi-ment], line 20 [sup-press], line 24 [physiologi-cally], line 71 [IA-based algorithms], etc...

Author Response

Reviewer 3:

C: The article corresponds to the subject of the Journal. The text is structured, all necessary sections are present. However, the manuscript is prepared carelessly. The article as presented cannot be accepted. After reading, comments arise.

A: We appreciate all considerations which we seek to meet in the best possible way to improve our manuscript.

C: Key words. It should be added the subject of your research – the pineapples.

A: Thank you for the suggestion, the word pineapple has been added to keywords

C: On page 2, the figure is not labeled and is not referenced in the text. Please check.

A: Thank you for your consideration, the figure is a summary figure which will be submitted in the appropriate place.

C: Line 78-79: (i) and (ii) ???

A: i and ii refers to the two objectives we propose for the work

C: Line 98: ...the soil was chemically analyzed. What kind of instrument was used? Brand and manufacturer? At what distance from the soil surface were the soil samples taken?

A: Thank you for the suggestion, the information has been added to the text.

C: Line 106: The authors used Perola pineapple. However, the year of the pineapple harvest is not indicated and the original manufacturer of these seedlings or seeds should be mentioned.

A: Pineapples do not have seeds. Propagation is carried out vegetatively by different types of seedlings (crown, pup, sucker) or by micropropagation. We did not purchase the seedlings. They were originally obtained from a commercial production area and planted in an area adjacent to the experiment. From this area, new pup seedlings were obtained and used for the experiment.

C: Line 116: What kind of refractometer was used? Brand?

A: Thanks for the suggestion, the information has been added

C: Line 129: Please include information about the manufacturer of the chromatograph.

A: Thanks for the suggestion, the manufacturer has been added

C: Page 5. The authors should edit the text. The figure captions and text have spread across the page and fragments of phrases have appeared.

A: We appreciate the suggestion and the work was organized in terms of figures and captions.

C: Pages 7-8: the situation is similar to the previous one. It is difficult to concentrate on reading because the text is shifting. The text should be checked before sending to the journal.

A: We appreciate the suggestion and the work was organized.

Reviewer 4 Report

Comments and Suggestions for Authors

The manuscript titled «Predicting Pineapple Quality from Hyperspectral Data of Plant Parts Applied to Machine Learning» addresses the issue of predicting pineapple quality using hyperspectral data applied to machine learning. The goal was to test accurate ML models for predicting pineapple fruit quality and the best input data for these algorithms.

The article corresponds to the subject of the Journal. The text is structured, all necessary sections are present. However, the manuscript is prepared carelessly. The article as presented cannot be accepted. After reading, comments arise.

Key words. It should be added the subject of your research – the pineapples.
On page 2, the figure is not labeled and is not referenced in the text. Please check.
Line 78-79: (i) and (ii) ???
Line 98: ...the soil was chemically analyzed. What kind of instrument was used? Brand and manufacturer? At what distance from the soil surface were the soil samples taken?
Line 106: The authors used Perola pineapple. However, the year of the pineapple harvest is not indicated and the original manufacturer of these seedlings or seeds should be mentioned.
Line 116: What kind of refractometer was used? Brand?
Line 129: Please include information about the manufacturer of the chromatograph.
Page 5. The authors should edit the text. The figure captions and text have spread across the page and fragments of phrases have appeared.
Pages 7-8: the situation is similar to the previous one. It is difficult to concentrate on reading because the text is shifting. The text should be checked before sending to the journal.

Author Response

Reviewer 4:

C: Lines 110-111: Why was this particular pineapple planting scheme chosen?

A: We chose this system because planting over plastic film is a good option for controlling weeds in the crop. However, since plastic limits the access of rainwater to the roots, we placed drip tapes under the plastic to irrigate the plants. In addition, since plastic would also make it difficult to topdress with granulated fertilizers applied to the soil, we diluted the fertilizers and applied them via fertigation. This system is common for farmers who use plastic film.

C: In Figure 1, the caption for the months is not in English.

A: Thanks for the suggestion, the months in the figure have been translated

C: The figures in the text are misaligned, and the formatting of the manuscript appears sloppy. This may be due to conversion from Word to PDF.

A: We appreciate the suggestion and the work was organized

C: Why does the SVM algorithm yield the best results?

A: We appreciate the suggestion, as mentioned throughout the results and discussion section. Finally, the best model was the SVM for presenting the best results for r and lowest for MAE, proving to be the most robust model for most of the pineapple fruit quality variables analyzed.

C: How are acidity and °Brix related to peel and fruit reflectance?

A: Acidity and °Brix are related to the reflectance of the skin and fruit because both influence the chemical and structural composition of plant tissues, affecting the way in which light is absorbed and reflected at different wavelengths. Acidity is associated with the presence of organic acids, which can interact with structural components of the skin and pulp, altering absorption in certain bands of the spectrum, especially in the visible and near infrared. °Brix, which indicates the concentration of soluble solids (mainly sugars), can affect reflectance by modifying cell density and composition, resulting in variations in the absorption and scattering of light, especially in spectral regions such as the near infrared (NIR).

C: “For future work, increasing the number of leaf samples could provide a more robust and representative database for the analyses.” What number of leaves do you think is necessary to achieve a good result?

A: The ideal number of leaves to obtain a database is around 100 samples, as was done in the study. However, in some cases, increasing this number improves the results, making them more robust and representative, which depends on several factors, such as the genetic variability of the plants and environmental conditions. The use of 200 to 500 samples to ensure greater representativeness and reduce data variability could be tested.

Reviewer 5 Report

Comments and Suggestions for Authors

The article “Predicting Pineapple Quality from Hyperspectral Data of Plant Parts Applied to Machine Learning” describes the use of various machine-learning algorithms to predict the quality of growing pineapples. The authors employed three input parameters—leaf reflectance, peel reflectance, and fruit reflectance—to analyze the data. Additionally, six algorithms were tested, and the most optimal one was selected. Overall, the work is written in clear language. However, there are several areas that require improvement, as outlined below:

1. Lines 110-111: Why was this particular pineapple planting scheme chosen?

2. In Figure 1, the caption for the months is not in English.

3. The figures in the text are misaligned, and the formatting of the manuscript appears sloppy. This may be due to conversion from Word to PDF.

4. Why does the SVM algorithm yield the best results?

5. How are acidity and °Brix related to peel and fruit reflectance?

6. “For future work, increasing the number of leaf samples could provide a more robust and representative database for the analyses.” What number of leaves do you think is necessary to achieve a good result?

Author Response

C: Lines 110-111: Why was this particular pineapple planting scheme chosen?

C: In Figure 1, the caption for the months is not in English.

A: Thanks for the suggestion, the months in the figure have been translated

C: The figures in the text are misaligned, and the formatting of the manuscript appears sloppy. This may be due to conversion from Word to PDF.

A: We appreciate the suggestion and the work was organized

C: Why does the SVM algorithm yield the best results?

C: How are acidity and °Brix related to peel and fruit reflectance?

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I would like to express my gratitude for the thorough responses to my inquiries. However, I have several substantial observations to make.

Despite the abstract's redesign, it remains deficient in the provision of specific quantitative outcomes.

Please check and correct typos throughout the manuscript. For example “daidzin e genistin”; “GL: degrees of freedom”. Use either “F.V” or “FV”, as well as “C.V.” or “CV”.

Based on the MAE data (Tables 3-8), how can we judge what is better predicted in pineapple fruit? Genistin (MAE=1451) or Daidzein (MAE=338) or Acidity (MAE= 0.062)? For comparison it is necessary to give the relative error.

Line 69. ‘We appreciate the suggestion, other articles on the subject have been added …’ Unfortunately, these changes are not present in the manuscript. ‘The use of hyperspectral sensors has many advantages, such as the fact that the fruit is not damaged and that the quality characteristics of the fruit can be predicted very well [9].’ More references should be cited, there are many papers on determining fruit quality using hyperspectral imaging.

Line 184. Table 2. The raw results of the ANOVA should be provided: Sum of Squares (SumSq); Mean Square (MeanSq); F (F value) (e.g. https://www.mdpi.com/2071-1050/17/6/2572; https://www.mdpi.com/2072-4292/11/1/68). Please explain Pearson's correlation coefficient (r) values greater than 1.

Author Response

Reviewer 1:

C: Does the introduction provide sufficient background and include all relevant references? Can be improved

A: The introduction presents a general context of the work, highlighting the problem and exploring alternative modern ways to solve it. The structure provides a basis for the development of the study, inserting the central idea of the work throughout the text for the reader. If there are specific suggestions for additional references or points that could be explored in more depth, we are available to incorporate them and further improve the section.

We have improved the introduction, please see line 78 to 85.

C: Is the research design appropriate? Can be improved

A: We appreciate the suggestion and are committed to improving our work. In order to properly respond to your request, it would be important to receive details about what improvements can be made and what figure it refers to. This way, we can adjust accurately and in line with expectations.

C: Are the methods adequately described? Must be improved

A: Thank you for your suggestion. In the previous round, several changes were made to this section. We believe that we have provided the necessary information. If not, it would be important to receive details on what improvements can be made. This way, we can make the adjustments accurately and in line with expectations.

C: Are the results clearly presented? Must be improved

C: Are the conclusions supported by the results? Must be improved

A: We appreciate the suggestion, and improvements were made to the conclusion in order to meet the request.

C: I would like to express my gratitude for the thorough responses to my inquiries. However, I have several substantial observations to make.

A: We appreciate all considerations and will try to meet them in the best possible way.

C: Despite the abstract's redesign, it remains deficient in the provision of specific quantitative outcomes.

A: We appreciate your consideration and value your suggestions. We improved the abstract by including quantitative data and adjusting it to the Agriengineering-template, which indicates the abstract as a single paragraph of about 200 words maximum. Please see the abstract. Thank you.

C: Please check and correct typos throughout the manuscript. For example “daidzin e genistin”; “GL: degrees of freedom”. Use either “F.V” or “FV”, as well as “C.V.” or “CV”.

A: Thanks for the suggestion, comments have been made

C: Based on the MAE data (Tables 3-8), how can we judge what is better predicted in pineapple fruit? Genistin (MAE=1451) or Daidzein (MAE=338) or Acidity (MAE= 0.062)? For comparison it is necessary to give the relative error.

A: The MAE (Mean Absolute Error) already provides a direct measure of model accuracy, since it is expressed in the same unit as the predicted variable. This allows the absolute magnitude of the error to be assessed without the need for conversion. However, for comparisons between variables with different scales, such as genistin (µg/g), daidzein (µg/g) and acidity (%), the relative error can be useful. Nevertheless, within each variable, the MAE is sufficient to judge the quality of the prediction.

C: Line 69. ‘We appreciate the suggestion, other articles on the subject have been added …’ Unfortunately, these changes are not present in the manuscript. ‘The use of hyperspectral sensors has many advantages, such as the fact that the fruit is not damaged and that the quality characteristics of the fruit can be predicted very well [9].’ More references should be cited, there are many papers on determining fruit quality using hyperspectral imaging.

A: We appreciate the suggestion and more citations have been added to the introduction section of the manuscript.

C: Line 184. Table 2. The raw results of the ANOVA should be provided: Sum of Squares (SumSq); Mean Square (MeanSq); F (F value) (e.g. https://www.mdpi.com/2071-1050/17/6/2572; https://www.mdpi.com/2072-4292/11/1/68). Please explain Pearson's correlation coefficient (r) values greater than 1.

A: We appreciate the suggestion. Table 2 already provides the Sum of Squares and the significance of the F test. The authors consider that this information is sufficient for the proper interpretation of the results. There were no Pearson correlation coefficients (r) greater than 1 in the tables.

Reviewer 2 Report

Comments and Suggestions for Authors

Dear Authors,

Thank you for your response to the review comments. However, I have a few concerns regarding the revision.

It appears that the specific issue regarding the typographical error in the phrase “IA-based algorithms”in the Introduction has not been addressed in the revised manuscript. While you have acknowledged the suggestion, the actual correction of the error has not been made.
The use of a comma as the decimal point in Table 1 is unconventional in English-language publications; please switch to using a period (.) to align with standard formatting practices.
I previously suggested that tables might better highlight differences in ML performances than figures. However, since you have replaced the figures with tables, the sentence in Section 2.5 mentioning the creation of boxplots and the use of ggplot2 packageshould be removed as it is no longer relevant.
In Table 2, the abbreviation for degrees of freedom is inconsistently used as “GL”in the notes but “DF”in the table header. The standard abbreviation for degrees of freedom is “DF”. Please standardize the notation to “DF” throughout the table and its annotations to avoid confusion.
Based on the degrees of freedom (DF) calculation in Table 2, it appears that the total sample size n is inferred to be 180. However, the manuscript mentions that only 100 plants were sampled. This discrepancy between the reported sample size and the calculated sample size raises significant concerns about the integrity of the data.
Given that Table 2 refers to F-values and mean squares rather than r and MAE.
In Table 2, the asterisk (*) symbol is used to denote significance at the 5% probability level by F-test. However, it also appears to be used to indicate the interaction between input and ML. This dual use of the same symbol can lead to confusion for readers. To ensure clarity and avoid ambiguity, please ensure that each symbol in the table has a single, consistent meaning throughout the manuscript.

As a reviewer, I would like to emphasize the importance of carefully revising the manuscript to address all identified issues. This not only improves the clarity and readability of the manuscript but also demonstrates a commitment to high standards of academic writing.

I would recommend revisiting the manuscript and making the necessary corrections to ensure that the text is free from errors. Additionally, engaging a professional editor or native English speaker to review the manuscript could be beneficial in identifying and correcting any further issues that may have been overlooked.

I look forward to seeing the revised version of the manuscript with the corrections applied.

Author Response

Reviewer 2:

C: Is the research design appropriate? Can be improved

A: Yes, the research design is appropriate for the proposed objectives, ensuring a rigorous approach aligned with recognized methodologies in the field. If there are specific suggestions for improvements, we are available to evaluate them and, if necessary, incorporate adjustments that can further strengthen the study.

C: Are the results clearly presented? Can be improved

C: Thank you for your response to the review comments. However, I have a few concerns regarding the revision.

A: We appreciate your considerations and will try to meet all considerations in the best possible way.

C: 1. It appears that the specific issue regarding the typographical error in the phrase “IA-based algorithms”in the Introduction has not been addressed in the revised manuscript. While you have acknowledged the suggestion, the actual correction of the error has not been made.

A: Thank you for the correction, the change has been made.

C: 2. The use of a comma as the decimal point in Table 1 is unconventional in English-language publications; please switch to using a period (.) to align with standard formatting practices.

A: Thank you for the correction, the change has been made.

C: 3. I previously suggested that tables might better highlight differences in ML performances than figures. However, since you have replaced the figures with tables, the sentence in Section 2.5 mentioning the creation of boxplots and the use of ggplot2 packageshould be removed as it is no longer relevant.

A: Thank you for your consideration, the corrections have been made.

C: 4. In Table 2, the abbreviation for degrees of freedom is inconsistently used as “GL”in the notes but “DF”in the table header. The standard abbreviation for degrees of freedom is “DF”. Please standardize the notation to “DF” throughout the table and its annotations to avoid confusion.

A: Thank you for your consideration, the corrections have been made.

C: 5. Based on the degrees of freedom (DF) calculation in Table 2, it appears that the total sample size n is inferred to be 180. However, the manuscript mentions that only 100 plants were sampled. This discrepancy between the reported sample size and the calculated sample size raises significant concerns about the integrity of the data.

A: We appreciate the observation. The degree of freedom presented in Table 2 refers to the ANOVA of the models, which considers the total number of observations used in the statistical analysis. Although the manuscript mentions that 100 leaves and fruits were sampled, the total number of observations may be higher due to the structure of the experiment, such as the subdivision of the data into different combinations of factors analyzed.

C: 6. Given that Table 2 refers to F-values and mean squares rather than r and MAE.

A: Thank you for your consideration, these are mean squares of the d and mae variables for each predicted variable that was analyzed.

C: In Table 2, the asterisk (*) symbol is used to denote significance at the 5% probability level by F-test. However, it also appears to be used to indicate the interaction between input and ML. This dual use of the same symbol can lead to confusion for readers. To ensure clarity and avoid ambiguity, please ensure that each symbol in the table has a single, consistent meaning throughout the manuscript.

A: We appreciate your observation. The asterisk (*) symbol in Table 2 is used to indicate statistical significance at 5% by the F-test, including the significance of the interaction between input and ML.

C: As a reviewer, I would like to emphasize the importance of carefully revising the manuscript to address all identified issues. This not only improves the clarity and readability of the manuscript but also demonstrates a commitment to high standards of academic writing.

A: We appreciate your feedback and attention to detail. We are carefully reviewing the manuscript to ensure clarity, accuracy, and alignment with high standards of academic writing. We value the reviewers' considerations and are committed to improving the work based on their suggestions.

C: I would recommend revisiting the manuscript and making the necessary corrections to ensure that the text is free from errors. Additionally, engaging a professional editor or native English speaker to review the manuscript could be beneficial in identifying and correcting any further issues that may have been overlooked.

A: We appreciate your feedback and attention to detail. We are carefully reviewing the manuscript to ensure clarity, accuracy, and alignment with high standards of academic writing.

C: I look forward to seeing the revised version of the manuscript with the corrections applied.

A: We appreciate your interest and contributions. We are implementing the revisions with care and commitment to improve the manuscript.

Reviewer 4 Report

Comments and Suggestions for Authors

Accept in present form.

Author Response

Reviewer 4:

C: Is the research design appropriate? Can be improved

A: We appreciate your consideration and value your suggestions. We would appreciate specific suggestions for possible improvements so that we can improve the manuscript in the best possible way.

C: Accept in present form.

A: We appreciate the contributions made to improve our work

Reviewer 5 Report

Comments and Suggestions for Authors

Dear Authors,

I am satisfied with your response to my comments.

Author Response

Reviewer 5:

C: I am satisfied with your response to my comments.

A: We appreciate the contributions made to improve our work

Round 3

Reviewer 1 Report

Comments and Suggestions for Authors

Comments and suggestions for authors in the attachment.

Comments for author File: Comments.pdf

Author Response

A: We are grateful to the reviewer for allowing us to explain the nomenclature of the compounds in more detail.

Thank you for alerting us to the mistranslation regarding the term "and".

We have corrected it. Please see page 5, line 190. Thank you.

However, if the reviewer had referred to flavonoids, we have included photos below with our standards used to determine the compounds. Please note that the names in the photos are the same as those we used in the manuscript. In addition, we have published manuscripts with these terms. Thank you.

https://doi.org/10.3390/agronomy14092046

https://doi.org/10.1038/s41598-024-68117-z

C: C: Line 184. Table 2. The raw results of the ANOVA should be provided: Sum of Squares (SumSq); Mean Square (MeanSq); F (F value) (e.g.

https://www.mdpi.com/2071-1050/17/6/2572; https://www.mdpi.com/2072-

4292/11/1/68). Please explain Pearson's correlation coefficient (r) values greater than 1.

A: We appreciate the comment and would like to clarify that Table 2 already presents the main information from the analysis of variance (ANOVA), including mean squares, degrees of freedom and the statistical significance of the F test. The authors consider that these data are sufficient for the interpretation of the results, according to the objectives of the study, which is why we chose not to include the table with the raw ANOVA in the main body of the article, since only the sum of squares is missing. However, this table was made available in the response letter for the purposes of transparency and to complement the analysis, as requested.

Acidity:

	DF	SS	MS	Fc	Pr>Fc
input	2	0.7666	0.3833	53.047	0.00E+00
ML	5	1.2198	0.24396	33.766	0.00E+00
input*ML	10	0.5003	0.05003	6.924	5.55E-09
Residuals	162	1.1705	0.007225
Total	179	3.6571	1
CV		54.12%

MAE

	DF	SS	MS	Fc	Pr>Fc
input	2	0.00147	0.000735	24.3497	6.00E-10
ML	5	0.003638	0.000728	24.1067	0.00E+00
input*ML	10	0.001377	0.000138	4.5607	1.06E-05
Residuals	162	0.00489	3.02E-05
Total	179	0.011375	6.35E-05
CV		9.55%

Brix

	DF	SS	MS	Fc	Pr>Fc
input	2	1.8991	0.94955	167.438	0.00E+00
ML	5	4.7377	0.94754	167.079	0.00E+00
input*ML	10	1.5361	0.15361	27.087	8.52E-30
Residuals	162	0.9187	0.005671
Total	179	9.0917	0.050792
CV		35.61%

MAE

	DF	SS	MS	Fc	Pr>Fc
input	2	0.5401	0.27005	37.307	4.71E-14
ML	5	2.0861	0.41722	57.642	0.00E+00
input*ML	10	0.9448	0.09448	13.054	1.38E-16
Residuals	162	1.1726	0.007238
Total	179	4.7436	0.026501
CV		7.99%

Ratio:

	DF	SS	MS	Fc	Pr>Fc
input	2	3.3465	1.67325	292.129	0.00E+00
ML	5	3.7802	0.75604	131.994	0.00E+00
input*ML	10	1.2781	0.12781	22.314	8.12E-26
Residuals	162	0.9279	0.005728
Total	179	9.3326	0.052137
CV		28.5%

MAE

	DF	SS	MS	Fc	Pr>Fc
input	2	8.067	4.0335	69.25	0.00E+00
ML	5	13.398	2.6796	46.004	0.00E+00
input*ML	10	6.716	0.6716	11.53	8.02E-15
Residuals	162	9.436	0.058247
Total	179	37.617	0.210151
CV		8.18%

Daidzein:

	DF	SS	MS	Fc	Pr>Fc
input	2	0.9974	0.4987	61.97	1.03E-20
ML	5	6.1802	1.23604	153.588	0.00E+00
input*ML	10	1.1245	0.11245	13.972	1.31E-17
Residuals	162	1.3037	0.008048
Total	179	9.6059	0.053664
CV		39.16%

MAE

	DF	SS	MS	Fc	Pr>Fc
input	2	25555	12777.5	25.518	2.32E-10
ML	5	395271	79054.2	157.881	0.00E+00
input*ML	10	83790	8379	16.734	0.00E+00
Residuals	162	81117	500.7222
Total	179	585733	3272.251
CV		8.78%

Daidzin:

	DF	SS	MS	Fc	Pr>Fc
input	2	0.2879	0.14395	23.548	1.10E-09
ML	5	3.6022	0.72044	117.836	0.00E+00
input*ML	10	0.3012	0.03012	4.927	3.22E-06
Residuals	162	0.9905	0.006114
Total	179	5.1819	0.028949
CV		42.37%

MAE

	DF	SS	MS	Fc	Pr>Fc
input	2	894117	447058.5	26.153	0.00E+00
ML	5	5432174	1086435	63.556	0
input*ML	10	480491	48049.1	2.811	0.0030696
Residuals	162	2769240	17094.07
Total	179	9576023	53497.34
CV		8.03%

Genistin:

	DF	SS	MS	Fc	Pr>Fc
input	2	0.472	0.236	35.611	1.52E-13
ML	5	6.8074	1.36148	205.417	0.00E+00
input*ML	10	0.6276	0.06276	9.47	2.62E-12
Residuals	162	1.0737	0.006628
Total	179	8.9808	0.050172
CV		34.44%

MAE

	DF	SS	MS	Fc	Pr>Fc
input	2	175125	87562.5	3.672	0.027558
ML	5	6934556	1386911	58.165	0
input*ML	10	566126	56612.6	2.374	0.011936
Residuals	162	3862785	23844.35
Total	179	11538592	64461.41
CV		13.8%

Please explain the presence of Pearson correlation coefficients (r) 1.24, 1.36 and 1.67 in Table 2. It is well known that Pearson correlation coefficients (r) cannot be greater than 1. The presence of such results casts doubt on all numerical data presented in the manuscript.

A: We would also like to clarify that there are no Pearson correlation coefficients (r) greater than 1 in the ANOVA tables. The numbers presented there correspond exclusively to the sums of squares, and not to the coefficient r itself. The acronym “r”, as well as “MAE”, used in these tables of the manuscript refers to the mean square (MS) values for each variable analyzed (such as ratio, acidity, Brix and flavonoids). The correlation coefficients and mean absolute error (MAE) values are duly presented in the tables intended for comparison of means.

We hope that these clarifications address the observation made.

Despite what you write about the presence of ‘Sum of Squares (SumSq); Mean Square (MeanSq)’ in Table 2, unfortunately I did not find these statistics in the table.

A: We appreciate the comment and would like to clarify that Table 2 presents the main information from the analysis of variance (ANOVA), including the mean squares, degrees of freedom and the statistical significance of the F test. The authors consider that these data are sufficient for the interpretation of the results, according to the objectives of the study, which is why we chose not to include the table with the raw ANOVA in the main body of the article, since only the sum of squares is missing, which does not interfere with the interpretation of the results. However, its presence is stated in the response letter together with the raw ANOVA of all the variables evaluated in the manuscript.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

In Section 2.5, although the word was changed from figures to tables, the description of ggplot2 package for plotting still exists. Please be careful, I do not know the relationship between the table and the ggplot2 package.

Author Response

C: In Section 2.5, although the word was changed from figures to tables, the description of ggplot2 package for plotting still exists. Please be careful, I do not know the relationship between the table and the ggplot2 package.

A: We appreciate the correction, the package used was only ExpeDes for the evaluation of the comparison of averages and this was adjusted in the text. Please see page 5, line 180. Thank you.

Round 4

Reviewer 1 Report

Comments and Suggestions for Authors

Please provide responses to my comments regarding Table 2.

Author Response

C: Please provide responses to my comments regarding Table 2.

A: Dear reviewer, thank you again for your question. Table 2 contains the summary of the analysis of variance for the variables Pearson correlation between the observed values and those predicted by the ML models (abbreviation in Table 2 - r) and mean absolute error between the observed values and those predicted by the ML models (abbreviation in Table 2 - MAE) for each variable evaluated (Acidity, Brix, Ratio, Daidzen, Daidzin and Ginistin).

Therefore, the values below r do not refer to the Pearson correlation coefficient itself but to the mean square of its ANOVA. The same is true for the values below MAE in Table 2 (they are mean square values).

The values of r and MAE obtained are presented in the other tables of the manuscript and not in Table 2. Therefore, there are values for r greater than 1 in this Table, since they are mean square values of the ANOVA.

This ANOVA Summary presentation model has already been used by us in dozens of publications. Below is the link to the most recent one, which we published in 2025:

https://doi.org/10.1016/j.rsase.2025.101522

Note that in this manuscript the Table with the ANOVA summary has a value greater than 1.0 for the variable r, because as already explained, in this Table the value refers to the mean square.

We hope we have been clear regarding your questions and we are available for any further clarifications.

Author Response File: Author Response.pdf

Round 5

Reviewer 1 Report

Comments and Suggestions for Authors

Dear Authors!

Thank you for answering my questions, I have no more questions about the manuscript.

Article Menu

Predicting Pineapple Quality from Hyperspectral Data of Plant Parts Applied to Machine Learning

Further Information

Guidelines

MDPI Initiatives

Follow MDPI