Next Article in Journal
Application of Geotechnologies in the Characterization of Forage Palm Production Areas in the Brazilian Semiarid Region
Next Article in Special Issue
Utilizing TabPFN Transformer with IoT Environmental Data for Early Prediction of Grapevine Diseases
Previous Article in Journal / Special Issue
Artificial Intelligence in the Identification of Germinated Soybean Seeds
 
 
Article
Peer-Review Record

Predicting Pineapple Quality from Hyperspectral Data of Plant Parts Applied to Machine Learning

AgriEngineering 2025, 7(6), 170; https://doi.org/10.3390/agriengineering7060170
by Vitória Carolina Dantas Alves 1, Sebastião Ferreira de Lima 1, Dthenifer Cordeiro Santana 1, Rafael Ferreira Barreto 1, Roger Augusto da Cunha 1, Ana Carina da Silva Cândido Seron 1, Larissa Pereira Ribeiro Teodoro 1, Paulo Eduardo Teodoro 1, Rita de Cássia Félix Alvarez 1, Cid Naudi Silva Campos 1, Carlos Antonio da Silva Junior 2 and Fábio Luíz Checchio Mingotte 3,*
Reviewer 1:
Reviewer 2:
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Reviewer 5: Anonymous
AgriEngineering 2025, 7(6), 170; https://doi.org/10.3390/agriengineering7060170
Submission received: 3 February 2025 / Revised: 30 April 2025 / Accepted: 14 May 2025 / Published: 3 June 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

General comments:

The authors have carried out a study aimed at solving an important applied problem. At the same time, the manuscript describes the methodology of data preprocessing, processing and results obtained very poorly, which does not allow their expert evaluation.

The Abstract section of the manuscript needs to be revised. It should be noted that it is not necessary for the abstract to describe the methodology in such detail. It is necessary that this section contains brief information about the relevance, purpose, methods and results of the study.

Specific comments:

Line 2. Add two more keywords.

Line 35. Is it Graphical Abstract? What principle was used to divide the methods into 3 groups?

Line 38. ‘Ananas comosus’ – the author of the taxon should be given.

Line 65. More references should be cited, there are many papers on determining fruit quality using hyperspectral imaging.

Line 66. ‘this method is very accurate’ – that's a controversial statement.

Line 95. Perhaps it should be stated not ‘during the experiment’ but ‘during the period of the study’.

Line 101. ‘Results of soil chemical analysis’.

Line 137. How many measurements were taken on each plant?

Line 151. Please give and describe the scheme of neural network. Specify hyperparameters for machine learning algorithms.

Line 153. It is necessary to give relative errors and R2.

Line 154. The manuscript does not present the results of analysis of variance. Please provide a scheme of analysis of variance.

Line 158. How was the correlation coefficient calculated? How many data points were involved? Describe the calculation procedure. How was the DT method applied? What was the input, what was the output?

Line 165. ‘better results for SVM with a mean prediction of 0.43 correlation coefficient’ – it's a very weak correlation.

Line 188. Please present the results of the modelling in the form of regression with the relative error (RMSE) and coefficient of determination (R2).

Line 250. What exactly are you talking about, write in one term what you determined by liquid chromatography?

Line 418. Section References. DOI for all references must be provided.

Author Response

Reviewer 1:  

General comments:

 

C: The authors have carried out a study aimed at solving an important applied problem. At the same time, the manuscript describes the methodology of data preprocessing, processing and results obtained very poorly, which does not allow their expert evaluation.

A:  We appreciate your consideration. The explanation about data pre-processing is not presented because there was none. To simplify the data analysis process, the information provided by the sensor was tabulated and processed by the software mentioned in the work. However, in order to improve the clarity of this section of the work, more information was added.

 

C: The Abstract section of the manuscript needs to be revised. It should be noted that it is not necessary for the abstract to describe the methodology in such detail. It is necessary that this section contains brief information about the relevance, purpose, methods and results of the study.

A:  We appreciate the suggestion, the methodology was less detailed in the abstract and more information about relevance, purpose and results of the study was added to the abstract.

 

Specific comments:

C: Line 2. Add two more keywords.

A:  We added it. Please see lines 33 and 34. Thank you.

 

C: Line 35. Is it Graphical Abstract? What principle was used to divide the methods into 3 groups?

A:  Thank you for the suggestion. Yes, this is a Graphical Abstract. The division of the methods into three groups was based on the principle that leaf reflectance has been widely used in other crops to associate characteristics of final interest, such as seed germination or industrial attributes of soybeans (https://doi.org/10.1016/j.saa.2024.123963, https://doi.org/10.1016/j.infrared.2024.105326, https://doi.org/10.3390/agriengineering6040272) . Based on this logic, the authors initially considered that leaf reflectance could be a tool to predict, before harvest, the desired characteristics of pineapple. However, when exploring other possibilities — which demonstrated better performance — it was realized that the reflectance of the skin and fruit could provide even more relevant results.

 

C: Line 38. ‘Ananas comosus’ – the author of the taxon should be given.

A:  Thank you for the correction, the requested change has been made.

 

C: Line 65. More references should be cited, there are many papers on determining fruit quality using hyperspectral imaging.

A:  We appreciate the suggestion, other articles on the subject have been added, especially in the discussion

 

C: Line 66. ‘this method is very accurate’ – that's a controversial statement.

A:  Thank you for the suggestion, the term has been removed to avoid confusion and be clearer.

 

C: Line 95. Perhaps it should be stated not ‘during the experiment’ but ‘during the period of the study’.

A:  Thank you for the suggestion, the term has been changed.

 

C: Line 101. ‘Results of soil chemical analysis’.

A:  Thank you for the suggestion, the term has been changed

 

C: Line 137. How many measurements were taken on each plant?

A:  Thank you for your question, 3 hyperspectral readings were taken from each leaf, bark and fruit of each plant and the average of these measurements was calculated.

 

C: Line 151. Please give and describe the scheme of neural network. Specify hyperparameters for machine learning algorithms.

A:  Thanks for the suggestion, Information about the hyperparameters of the models used has been added

 

C: Line 153. It is necessary to give relative errors and R2.

A:  The use of the correlation coefficient (r) and the mean absolute error (MAE) as model evaluation metrics is sufficient and appropriate for this study. The correlation coefficient (r) measures the strength of the linear relationship between predicted and observed values, allowing a direct analysis of the association between variables. Unlike R2, which represents the proportion of variance explained by the model, r is a more intuitive metric to assess the quality of predictions, especially in spectral studies, where relationships can be complex. The mean absolute error (MAE) provides a clear measure of model accuracy, as it expresses the average error of predictions in the same unit as the response variable. Unlike RMSE, which penalizes extreme outliers, MAE allows a balanced assessment of model accuracy without overvaluing specific errors, which is advantageous for spectral data. Furthermore, R2 can be influenced by patterns of variance in the data and may not adequately represent model quality when the relationship between variables is not strictly linear. RMSE, in turn, emphasizes large individual errors, which can distort the perception of predictive performance in a context where small variations are already expected.

 

C: Line 154. The manuscript does not present the results of analysis of variance. Please provide a scheme of analysis of variance.

A:  Thank you for the suggestion, the analysis of variance has been added.

 

C: Line 158. How was the correlation coefficient calculated? How many data points were involved? Describe the calculation procedure. How was the DT method applied? What was the input, what was the output?

A:  The correlation coefficient was automatically calculated by the software used in the analysis. This coefficient is obtained through the statistical relationship between the predictor variables and the response variable, reflecting the strength and direction of the association between them. The calculation follows the standard Pearson correlation formula, which measures the degree of linearity between the data;100 spectral samples of leaves, bark and fruit were analyzed, with 2,200 pieces of information coming from each reading. All calculation procedures were conducted by the software itself, in which we selected the machine learning models, and it provides us with performance metrics, such as the mean absolute error (MAE) and the correlation coefficient; the input of the models was applied using spectral data as input variables, while the output variable represented information related to the quality of the fruits. This information has been added more clearly in the text

 

C: Line 165. ‘better results for SVM with a mean prediction of 0.43 correlation coefficient’ – it's a very weak correlation.

A:  We appreciate the question regarding the correlation coefficient of 0.43 obtained by the SVM, which can be interpreted as a weak correlation. However, when dealing with complex spectral data, this value can be considered significant due to the nature of the data and the complex predictive modeling in this context, due to the size of the database and the complexity of the spectral relationship with fruit quality traits. It is important to highlight that the complexity of spectral data often results in lower correlation coefficients. This is due to the high dimensionality that can make it difficult to identify clear patterns and negatively impact the accuracy of predictive models. Therefore, when evaluating the results of models applied to complex spectral data, it is essential to consider these factors and interpret the correlation coefficients in the context of the difficulty associated with the analysis of such data. Even correlation coefficients that seem low can actually represent relevant predictive performance, given the limitations and challenges present in spectral data modeling.

 

C: Line 188. Please present the results of the modelling in the form of regression with the relative error (RMSE) and coefficient of determination (R2).

A:  The use of the correlation coefficient (r) and the mean absolute error (MAE) as model evaluation metrics is sufficient and appropriate for this study. The correlation coefficient (r) measures the strength of the linear relationship between predicted and observed values, allowing a direct analysis of the association between variables. Unlike R2, which represents the proportion of variance explained by the model, r is a more intuitive metric to assess the quality of predictions, especially in spectral studies, where relationships can be complex. The mean absolute error (MAE) provides a clear measure of model accuracy, as it expresses the average error of predictions in the same unit as the response variable. Unlike RMSE, which penalizes extreme outliers, MAE allows a balanced assessment of model accuracy without overvaluing specific errors, which is advantageous for spectral data. Furthermore, R2 can be influenced by patterns of variance in the data and may not adequately represent model quality when the relationship between variables is not strictly linear. RMSE, in turn, emphasizes large individual errors, which can distort the perception of predictive performance in a context where small variations are already expected. Therefore, the choice of r and MAE as metrics is sufficient to assess the quality of the model, ensuring a statistically robust interpretation aligned with the nature of the data analyzed.

 

C: Line 250. What exactly are you talking about, write in one term what you determined by liquid chromatography?

A: Liquid chromatography is an analytical technique used to separate, identify and quantify components of a mixture. It works based on the differential interaction of compounds with two phases: the mobile phase, in which a liquid (solvent or mixture of solvents) carries the sample through the system, and the stationary phase, which is a solid or liquid material fixed inside a column, where the compounds are separated. As the mobile phase passes through the stationary phase, the different compounds in the mixture interact differently with the two media, being slowed down at different speeds. This process results in the separation of the components, allowing their detection and analysis. One of the most advanced forms of this technique is ultra-performance liquid chromatography (UPLC), widely used in laboratories to analyze substances such as drugs, natural compounds, proteins and metabolites.  

 

C: Line 418. Section References. DOI for all references must be provided.

A:  Thank you for the suggestion, the DOI of all articles have been added.

Reviewer 2 Report

Comments and Suggestions for Authors

Summary:

The study investigates the use of machine learning (ML) for pineapple quality prediction using hyperspectral data under controlled conditions. However, some issues need further exploration as follows.

 

Major issues:

  1. Whether the hyperparameters of machine learning algorithms are optimized for fair comparison is unknown, and the use of Auto-WEKA tools should be declared. ANN should also adjust the neuron numbers to ensure optimal performance.

 

  1. Please adjust the layout of the figures and title correctly. In addition, the title should adequately state that A and B are r scores and MAE, respectively, to determine independent comprehensibility. Sometimes the tables can highlight the difference in ML performances in numbers better than the figures in the Results.

 

 

Minor issues:

  1. In the Introduction, the phrase ‘IA-based algorithms’is misspelled in the penultimate sentence. Careful manuscript reading combined with the assistance of a professional editor or native speaker of English is recommended.

Author Response

Reviewer 2:  

 

C: Whether the hyperparameters of machine learning algorithms are optimized for fair comparison is unknown, and the use of Auto-WEKA tools should be declared. ANN should also adjust the neuron numbers to ensure optimal performance.

A:  We appreciate the suggestion. The hyperparameters of each model were detailed in the text. In the case of neural networks, we used 10 neurons in each of the two layers, as indicated in previous studies (https://doi.org/10.3390/rs15245657, https://doi.org/10.3390/agriengineering6010020, https://doi.org/10.3390/agriengineering6040255), since Weka defines this configuration based on the number of variables in the database. Since hyperspectral data has more than 2000 variables, a larger number of neurons would make processing unfeasible. As for Auto-Weka, the authors chose to use models already established with the software's default hyperparameters, following the approach adopted in other studies (https://doi.org/10.3390/f15010039, https://doi.org/10.1016/j.saa.2024.124113, https://doi.org/10.3390/a17010023).

 

C: Please adjust the layout of the figures and title correctly. In addition, the title should adequately state that A and B are r scores and MAE, respectively, to determine independent comprehensibility. Sometimes the tables can highlight the difference in ML performances in numbers better than the figures in the Results.

A:  Thank you for the suggestion, The figures have been replaced by tables

 

C: In the Introduction, the phrase ‘IA-based algorithms’is misspelled in the penultimate sentence. Careful manuscript reading combined with the assistance of a professional editor or native speaker of English is recommended.

A: We appreciate the suggestion, careful reading of the manuscript along with assistance from a professional editor.

Reviewer 3 Report

Comments and Suggestions for Authors

This work aimed to analyze the hyperspectral image data for pineapple products and run AI models to predict quality metrics. Overall,  I do not think this work can be published.

-- 1. No research novelty. The author claimed that ''However, the prediction of pineapple quality by hyperspectral data applied to ML is not known." in abstract, while it took me no time to find a lot of papers published on such topics:

Accurate ripening stage classification of pineapple based on visible and near-infrared hyperspectral imaging system [https://doi.org/10.1093/jaoacint/qsaf010]

Towards a Multispectral Imaging System for Spatial Mapping of Chemical Composition in Fresh-Cut Pineapple (Ananas comosus) [https://doi.org/10.3390/foods12173243]

-- 2. totally irrelevant information in section 2.1

-- 3. A technical question: the models mentioned in this work, can only take 2-dimension inputs, while as known hyperspectral image data are stored as data cube, in 3-dimension, how the hsi data were feed to the models?

-- 4. no details regarding the models used, at all. Even for the most based ANN model there are a lot of hyperparameters (number of layers? number of neurons in each layer? optimizer? learning rate?) that need to be determined when using the model, without any of such information provided, the results cannot be considered as reliable.

-- 5. Figures are not fixed in the pdf file, making them unable for reading.

-- 6. The ultimate goal of this work is unknown. Based on the abstract and introduction, my original understanding was that this paper was trying to figure out a good model, but in the results part, the authors seemed to optimize the model result by using different parts of the pineapples. Which makes little to no sense regarding model design.

-- 7. model performance was not reported, training/validation/testing loss, etc

-- 8. the discussion section is not explaining the mechanics standing behind the results at all. e.g. line 374-377, ''Pineapple fruits have a high content of various flavonoids, that are directly linked to the color of the fruit and are involved in the white-yellow pigmentation of the fruit. It has an antioxidant function and cellular signaling pathways and plays na essencial part of plant growth and biotic and abiotic stress reduction [41]." I cannot see any necessity of discussing the relationship between the flavonoids and the yellowish color in this manuscript. Please be concise, using of relevant information.  

 

Comments on the Quality of English Language

Please seriously review the text before submitting a research article to avoid unnecessary typos, like line 19 [experi-ment], line 20 [sup-press], line 24 [physiologi-cally], line 71 [IA-based algorithms], etc...

Author Response

Reviewer 3:  

 

C: The article corresponds to the subject of the Journal. The text is structured, all necessary sections are present. However, the manuscript is prepared carelessly. The article as presented cannot be accepted. After reading, comments arise.

A: We appreciate all considerations which we seek to meet in the best possible way to improve our manuscript.  

 

C: Key words. It should be added the subject of your research – the pineapples.

A:  Thank you for the suggestion, the word pineapple has been added to keywords

 

C: On page 2, the figure is not labeled and is not referenced in the text. Please check.

A:  Thank you for your consideration, the figure is a summary figure which will be submitted in the appropriate place.

 

C: Line 78-79: (i) and (ii) ???

A: i and ii refers to the two objectives we propose for the work

 

C: Line 98: ...the soil was chemically analyzed. What kind of instrument was used? Brand and manufacturer? At what distance from the soil surface were the soil samples taken?

A:  Thank you for the suggestion, the information has been added to the text.

 

C: Line 106: The authors used Perola pineapple. However, the year of the pineapple harvest is not indicated and the original manufacturer of these seedlings or seeds should be mentioned.

A:  Pineapples do not have seeds. Propagation is carried out vegetatively by different types of seedlings (crown, pup, sucker) or by micropropagation. We did not purchase the seedlings. They were originally obtained from a commercial production area and planted in an area adjacent to the experiment. From this area, new pup seedlings were obtained and used for the experiment.

 

C: Line 116: What kind of refractometer was used? Brand?

A:  Thanks for the suggestion, the information has been added

 

C: Line 129: Please include information about the manufacturer of the chromatograph.

A:  Thanks for the suggestion, the manufacturer has been added

 

C: Page 5. The authors should edit the text. The figure captions and text have spread across the page and fragments of phrases have appeared.

A:  We appreciate the suggestion and the work was organized in terms of figures and captions.

 

C: Pages 7-8: the situation is similar to the previous one. It is difficult to concentrate on reading because the text is shifting. The text should be checked before sending to the journal.

A:  We appreciate the suggestion and the work was organized.

Reviewer 4 Report

Comments and Suggestions for Authors

The manuscript titled «Predicting Pineapple Quality from Hyperspectral Data of Plant  Parts Applied to Machine Learning» addresses the issue of predicting pineapple quality using hyperspectral data applied to machine learning. The goal was to test accurate ML models for predicting pineapple fruit quality and the best input data for these algorithms.

The article corresponds to the subject of the Journal. The text is structured, all necessary sections are present. However, the manuscript is prepared carelessly. The article as presented cannot be accepted. After reading, comments arise.

 

  1. Key words. It should be added the subject of your research – the pineapples.
  2. On page 2, the figure is not labeled and is not referenced in the text. Please check.
  3. Line 78-79: (i) and (ii) ???
  4. Line 98: ...the soil was chemically analyzed. What kind of instrument was used? Brand and manufacturer? At what distance from the soil surface were the soil samples taken?
  5. Line 106: The authors used Perola pineapple. However, the year of the pineapple harvest is not indicated and the original manufacturer of these seedlings or seeds should be mentioned.
  6. Line 116: What kind of refractometer was used? Brand?
  7. Line 129: Please include information about the manufacturer of the chromatograph.
  8. Page 5. The authors should edit the text. The figure captions and text have spread across the page and fragments of phrases have appeared.
  9. Pages 7-8: the situation is similar to the previous one. It is difficult to concentrate on reading because the text is shifting. The text should be checked before sending to the journal.

Author Response

Reviewer 4:  

 

C: Lines 110-111: Why was this particular pineapple planting scheme chosen?

A:  We chose this system because planting over plastic film is a good option for controlling weeds in the crop. However, since plastic limits the access of rainwater to the roots, we placed drip tapes under the plastic to irrigate the plants. In addition, since plastic would also make it difficult to topdress with granulated fertilizers applied to the soil, we diluted the fertilizers and applied them via fertigation. This system is common for farmers who use plastic film.

 

C: In Figure 1, the caption for the months is not in English.

A:  Thanks for the suggestion, the months in the figure have been translated

 

C: The figures in the text are misaligned, and the formatting of the manuscript appears sloppy. This may be due to conversion from Word to PDF.

A:  We appreciate the suggestion and the work was organized

 

C: Why does the SVM algorithm yield the best results?

A: We appreciate the suggestion, as mentioned throughout the results and discussion section. Finally, the best model was the SVM for presenting the best results for r and lowest for MAE, proving to be the most robust model for most of the pineapple fruit quality variables analyzed.

 

C: How are acidity and °Brix related to peel and fruit reflectance?

A:  Acidity and °Brix are related to the reflectance of the skin and fruit because both influence the chemical and structural composition of plant tissues, affecting the way in which light is absorbed and reflected at different wavelengths. Acidity is associated with the presence of organic acids, which can interact with structural components of the skin and pulp, altering absorption in certain bands of the spectrum, especially in the visible and near infrared. °Brix, which indicates the concentration of soluble solids (mainly sugars), can affect reflectance by modifying cell density and composition, resulting in variations in the absorption and scattering of light, especially in spectral regions such as the near infrared (NIR).

 

C: “For future work, increasing the number of leaf samples could provide a more robust and representative database for the analyses.” What number of leaves do you think is necessary to achieve a good result?

A: The ideal number of leaves to obtain a database is around 100 samples, as was done in the study. However, in some cases, increasing this number improves the results, making them more robust and representative, which depends on several factors, such as the genetic variability of the plants and environmental conditions. The use of 200 to 500 samples to ensure greater representativeness and reduce data variability could be tested.

Reviewer 5 Report

Comments and Suggestions for Authors

The article “Predicting Pineapple Quality from Hyperspectral Data of Plant Parts Applied to Machine Learning” describes the use of various machine-learning algorithms to predict the quality of growing pineapples. The authors employed three input parameters—leaf reflectance, peel reflectance, and fruit reflectance—to analyze the data. Additionally, six algorithms were tested, and the most optimal one was selected. Overall, the work is written in clear language. However, there are several areas that require improvement, as outlined below:

1. Lines 110-111: Why was this particular pineapple planting scheme chosen?

2. In Figure 1, the caption for the months is not in English.

3. The figures in the text are misaligned, and the formatting of the manuscript appears sloppy. This may be due to conversion from Word to PDF.

4. Why does the SVM algorithm yield the best results?

5. How are acidity and °Brix related to peel and fruit reflectance?

6. “For future work, increasing the number of leaf samples could provide a more robust and representative database for the analyses.” What number of leaves do you think is necessary to achieve a good result?

Author Response

C: Lines 110-111: Why was this particular pineapple planting scheme chosen?

A:  We chose this system because planting over plastic film is a good option for controlling weeds in the crop. However, since plastic limits the access of rainwater to the roots, we placed drip tapes under the plastic to irrigate the plants. In addition, since plastic would also make it difficult to topdress with granulated fertilizers applied to the soil, we diluted the fertilizers and applied them via fertigation. This system is common for farmers who use plastic film.

 

C: In Figure 1, the caption for the months is not in English.

A:  Thanks for the suggestion, the months in the figure have been translated

 

C: The figures in the text are misaligned, and the formatting of the manuscript appears sloppy. This may be due to conversion from Word to PDF.

A:  We appreciate the suggestion and the work was organized

 

C: Why does the SVM algorithm yield the best results?

A: We appreciate the suggestion, as mentioned throughout the results and discussion section. Finally, the best model was the SVM for presenting the best results for r and lowest for MAE, proving to be the most robust model for most of the pineapple fruit quality variables analyzed.

 

C: How are acidity and °Brix related to peel and fruit reflectance?

A:  Acidity and °Brix are related to the reflectance of the skin and fruit because both influence the chemical and structural composition of plant tissues, affecting the way in which light is absorbed and reflected at different wavelengths. Acidity is associated with the presence of organic acids, which can interact with structural components of the skin and pulp, altering absorption in certain bands of the spectrum, especially in the visible and near infrared. °Brix, which indicates the concentration of soluble solids (mainly sugars), can affect reflectance by modifying cell density and composition, resulting in variations in the absorption and scattering of light, especially in spectral regions such as the near infrared (NIR).

 

C: “For future work, increasing the number of leaf samples could provide a more robust and representative database for the analyses.” What number of leaves do you think is necessary to achieve a good result?

A: The ideal number of leaves to obtain a database is around 100 samples, as was done in the study. However, in some cases, increasing this number improves the results, making them more robust and representative, which depends on several factors, such as the genetic variability of the plants and environmental conditions. The use of 200 to 500 samples to ensure greater representativeness and reduce data variability could be tested.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I would like to express my gratitude for the thorough responses to my inquiries. However, I have several substantial observations to make.

Despite the abstract's redesign, it remains deficient in the provision of specific quantitative outcomes.

Please check and correct typos throughout the manuscript. For example “daidzin e genistin”; “GL: degrees of freedom”. Use either “F.V” or “FV”, as well as “C.V.” or “CV”.

Based on the MAE data (Tables 3-8), how can we judge what is better predicted in pineapple fruit? Genistin (MAE=1451) or Daidzein (MAE=338) or Acidity (MAE= 0.062)? For comparison it is necessary to give the relative error.

Line 69. ‘We appreciate the suggestion, other articles on the subject have been added …’ Unfortunately, these changes are not present in the manuscript. ‘The use of hyperspectral sensors has many advantages, such as the fact that the fruit is not damaged and that the quality characteristics of the fruit can be predicted very well [9].’ More references should be cited, there are many papers on determining fruit quality using hyperspectral imaging.

Line 184. Table 2. The raw results of the ANOVA should be provided: Sum of Squares (SumSq); Mean Square (MeanSq); F (F value) (e.g. https://www.mdpi.com/2071-1050/17/6/2572; https://www.mdpi.com/2072-4292/11/1/68). Please explain Pearson's correlation coefficient (r) values greater than 1.

Author Response

Reviewer 1:  

 

C: Does the introduction provide sufficient background and include all relevant references? Can be improved

A: The introduction presents a general context of the work, highlighting the problem and exploring alternative modern ways to solve it. The structure provides a basis for the development of the study, inserting the central idea of ​​the work throughout the text for the reader. If there are specific suggestions for additional references or points that could be explored in more depth, we are available to incorporate them and further improve the section.

We have improved the introduction, please see line 78 to 85.

 

C: Is the research design appropriate? Can be improved

A: We appreciate the suggestion and are committed to improving our work. In order to properly respond to your request, it would be important to receive details about what improvements can be made and what figure it refers to. This way, we can adjust accurately and in line with expectations.

 

C: Are the methods adequately described? Must be improved          

A: Thank you for your suggestion. In the previous round, several changes were made to this section. We believe that we have provided the necessary information. If not, it would be important to receive details on what improvements can be made. This way, we can make the adjustments accurately and in line with expectations.

 

C: Are the results clearly presented? Must be improved

A: Thank you for your suggestion. In the previous round, several changes were made to this section. We believe that we have provided the necessary information. If not, it would be important to receive details on what improvements can be made. This way, we can make the adjustments accurately and in line with expectations.

 

C: Are the conclusions supported by the results? Must be improved

A: We appreciate the suggestion, and improvements were made to the conclusion in order to meet the request.

 

C: I would like to express my gratitude for the thorough responses to my inquiries. However, I have several substantial observations to make.

A: We appreciate all considerations and will try to meet them in the best possible way.

 

C: Despite the abstract's redesign, it remains deficient in the provision of specific quantitative outcomes.

A: We appreciate your consideration and value your suggestions. We improved the abstract by including quantitative data and adjusting it to the Agriengineering-template, which indicates the abstract as a single paragraph of about 200 words maximum. Please see the abstract. Thank you.

 

 

C: Please check and correct typos throughout the manuscript. For example “daidzin e genistin”; “GL: degrees of freedom”. Use either “F.V” or “FV”, as well as “C.V.” or “CV”.

A: Thanks for the suggestion, comments have been made

 

C: Based on the MAE data (Tables 3-8), how can we judge what is better predicted in pineapple fruit? Genistin (MAE=1451) or Daidzein (MAE=338) or Acidity (MAE= 0.062)? For comparison it is necessary to give the relative error.

A: The MAE (Mean Absolute Error) already provides a direct measure of model accuracy, since it is expressed in the same unit as the predicted variable. This allows the absolute magnitude of the error to be assessed without the need for conversion. However, for comparisons between variables with different scales, such as genistin (µg/g), daidzein (µg/g) and acidity (%), the relative error can be useful. Nevertheless, within each variable, the MAE is sufficient to judge the quality of the prediction.

 

C: Line 69. ‘We appreciate the suggestion, other articles on the subject have been added …’ Unfortunately, these changes are not present in the manuscript. ‘The use of hyperspectral sensors has many advantages, such as the fact that the fruit is not damaged and that the quality characteristics of the fruit can be predicted very well [9].’ More references should be cited, there are many papers on determining fruit quality using hyperspectral imaging.

A: We appreciate the suggestion and more citations have been added to the introduction section of the manuscript.

 

C: Line 184. Table 2. The raw results of the ANOVA should be provided: Sum of Squares (SumSq); Mean Square (MeanSq); F (F value) (e.g. https://www.mdpi.com/2071-1050/17/6/2572; https://www.mdpi.com/2072-4292/11/1/68). Please explain Pearson's correlation coefficient (r) values greater than 1.

A: We appreciate the suggestion. Table 2 already provides the Sum of Squares and the significance of the F test. The authors consider that this information is sufficient for the proper interpretation of the results. There were no Pearson correlation coefficients (r) greater than 1 in the tables.

Reviewer 2 Report

Comments and Suggestions for Authors

Dear Authors,

Thank you for your response to the review comments. However, I have a few concerns regarding the revision.

  1. It appears that the specific issue regarding the typographical error in the phrase “IA-based algorithms”in the Introduction has not been addressed in the revised manuscript. While you have acknowledged the suggestion, the actual correction of the error has not been made.
  2. The use of a comma as the decimal point in Table 1 is unconventional in English-language publications; please switch to using a period (.) to align with standard formatting practices.
  3. I previously suggested that tables might better highlight differences in ML performances than figures. However, since you have replaced the figures with tables, the sentence in Section 2.5 mentioning the creation of boxplots and the use of ggplot2 packageshould be removed as it is no longer relevant.
  4. In Table 2, the abbreviation for degrees of freedom is inconsistently used as “GL”in the notes but “DF”in the table header. The standard abbreviation for degrees of freedom is “DF”. Please standardize the notation to “DF” throughout the table and its annotations to avoid confusion.
  5. Based on the degrees of freedom (DF) calculation in Table 2, it appears that the total sample size n is inferred to be 180. However, the manuscript mentions that only 100 plants were sampled. This discrepancy between the reported sample size and the calculated sample size raises significant concerns about the integrity of the data.
  6. Given that Table 2 refers to F-values and mean squares rather than r and MAE.
  7. In Table 2, the asterisk (*) symbol is used to denote significance at the 5% probability level by F-test. However, it also appears to be used to indicate the interaction between input and ML. This dual use of the same symbol can lead to confusion for readers. To ensure clarity and avoid ambiguity, please ensure that each symbol in the table has a single, consistent meaning throughout the manuscript.

As a reviewer, I would like to emphasize the importance of carefully revising the manuscript to address all identified issues. This not only improves the clarity and readability of the manuscript but also demonstrates a commitment to high standards of academic writing.

I would recommend revisiting the manuscript and making the necessary corrections to ensure that the text is free from errors. Additionally, engaging a professional editor or native English speaker to review the manuscript could be beneficial in identifying and correcting any further issues that may have been overlooked.

I look forward to seeing the revised version of the manuscript with the corrections applied.

Author Response

Reviewer 2: 

 

C: Is the research design appropriate? Can be improved

A: Yes, the research design is appropriate for the proposed objectives, ensuring a rigorous approach aligned with recognized methodologies in the field. If there are specific suggestions for improvements, we are available to evaluate them and, if necessary, incorporate adjustments that can further strengthen the study.

 

C: Are the results clearly presented? Can be improved

A: Thank you for your suggestion. In the previous round, several changes were made to this section. We believe that we have provided the necessary information. If not, it would be important to receive details on what improvements can be made. This way, we can make the adjustments accurately and in line with expectations.

 

C: Thank you for your response to the review comments. However, I have a few concerns regarding the revision.

A: We appreciate your considerations and will try to meet all considerations in the best possible way.

 

C: 1. It appears that the specific issue regarding the typographical error in the phrase “IA-based algorithms”in the Introduction has not been addressed in the revised manuscript. While you have acknowledged the suggestion, the actual correction of the error has not been made.

A: Thank you for the correction, the change has been made.

 

C: 2. The use of a comma as the decimal point in Table 1 is unconventional in English-language publications; please switch to using a period (.) to align with standard formatting practices.

A: Thank you for the correction, the change has been made.

 

C: 3.  I previously suggested that tables might better highlight differences in ML performances than figures. However, since you have replaced the figures with tables, the sentence in Section 2.5 mentioning the creation of boxplots and the use of ggplot2 packageshould be removed as it is no longer relevant.

A: Thank you for your consideration, the corrections have been made.

 

C: 4. In Table 2, the abbreviation for degrees of freedom is inconsistently used as “GL”in the notes but “DF”in the table header. The standard abbreviation for degrees of freedom is “DF”. Please standardize the notation to “DF” throughout the table and its annotations to avoid confusion.

A: Thank you for your consideration, the corrections have been made.

 

C: 5. Based on the degrees of freedom (DF) calculation in Table 2, it appears that the total sample size n is inferred to be 180. However, the manuscript mentions that only 100 plants were sampled. This discrepancy between the reported sample size and the calculated sample size raises significant concerns about the integrity of the data.

A: We appreciate the observation. The degree of freedom presented in Table 2 refers to the ANOVA of the models, which considers the total number of observations used in the statistical analysis. Although the manuscript mentions that 100 leaves and fruits were sampled, the total number of observations may be higher due to the structure of the experiment, such as the subdivision of the data into different combinations of factors analyzed.

 

C: 6. Given that Table 2 refers to F-values and mean squares rather than r and MAE.

A: Thank you for your consideration, these are mean squares of the d and mae variables for each predicted variable that was analyzed.

 

C: In Table 2, the asterisk (*) symbol is used to denote significance at the 5% probability level by F-test. However, it also appears to be used to indicate the interaction between input and ML. This dual use of the same symbol can lead to confusion for readers. To ensure clarity and avoid ambiguity, please ensure that each symbol in the table has a single, consistent meaning throughout the manuscript.

A: We appreciate your observation. The asterisk (*) symbol in Table 2 is used to indicate statistical significance at 5% by the F-test, including the significance of the interaction between input and ML.

 

C: As a reviewer, I would like to emphasize the importance of carefully revising the manuscript to address all identified issues. This not only improves the clarity and readability of the manuscript but also demonstrates a commitment to high standards of academic writing.

A: We appreciate your feedback and attention to detail. We are carefully reviewing the manuscript to ensure clarity, accuracy, and alignment with high standards of academic writing. We value the reviewers' considerations and are committed to improving the work based on their suggestions.

 

C: I would recommend revisiting the manuscript and making the necessary corrections to ensure that the text is free from errors. Additionally, engaging a professional editor or native English speaker to review the manuscript could be beneficial in identifying and correcting any further issues that may have been overlooked.

A: We appreciate your feedback and attention to detail. We are carefully reviewing the manuscript to ensure clarity, accuracy, and alignment with high standards of academic writing.

 

C: I look forward to seeing the revised version of the manuscript with the corrections applied.

A: We appreciate your interest and contributions. We are implementing the revisions with care and commitment to improve the manuscript.

Reviewer 4 Report

Comments and Suggestions for Authors

Accept in present form.

Author Response

Reviewer 4: 

 

C: Is the research design appropriate? Can be improved

A:  We appreciate your consideration and value your suggestions. We would appreciate specific suggestions for possible improvements so that we can improve the manuscript in the best possible way.

 

C: Accept in present form.

A:  We appreciate the contributions made to improve our work

Reviewer 5 Report

Comments and Suggestions for Authors

Dear Authors,

I am satisfied with your response to my comments.

 

Author Response

Reviewer 5: 

 

C: I am satisfied with your response to my comments.

A: We appreciate the contributions made to improve our work 

Round 3

Reviewer 1 Report

Comments and Suggestions for Authors

Comments and suggestions for authors in the attachment.

Comments for author File: Comments.pdf

Author Response

A:  We are grateful to the reviewer for allowing us to explain the nomenclature of the compounds in more detail.

Thank you for alerting us to the mistranslation regarding the term "and".

We have corrected it. Please see page 5, line 190. Thank you.

However, if the reviewer had referred to flavonoids, we have included photos below with our standards used to determine the compounds. Please note that the names in the photos are the same as those we used in the manuscript. In addition, we have published manuscripts with these terms. Thank you.

https://doi.org/10.3390/agronomy14092046

https://doi.org/10.1038/s41598-024-68117-z

 

C: C: Line 184. Table 2. The raw results of the ANOVA should be provided: Sum of Squares (SumSq); Mean Square (MeanSq); F (F value) (e.g.

https://www.mdpi.com/2071-1050/17/6/2572; https://www.mdpi.com/2072-

4292/11/1/68). Please explain Pearson's correlation coefficient (r) values greater than 1.

A: We appreciate the comment and would like to clarify that Table 2 already presents the main information from the analysis of variance (ANOVA), including mean squares, degrees of freedom and the statistical significance of the F test. The authors consider that these data are sufficient for the interpretation of the results, according to the objectives of the study, which is why we chose not to include the table with the raw ANOVA in the main body of the article, since only the sum of squares is missing. However, this table was made available in the response letter for the purposes of transparency and to complement the analysis, as requested.

 

Acidity:

r

 

DF

SS

MS

Fc

Pr>Fc

input

2

0.7666

0.3833

53.047

0.00E+00

ML

5

1.2198

0.24396

33.766

0.00E+00

input*ML

10

0.5003

0.05003

6.924

5.55E-09

Residuals

162

1.1705

0.007225

   

Total

179

3.6571

1

   

CV

 

54.12%

     

 

MAE

 

DF

SS

MS

Fc

Pr>Fc

input

2

0.00147

0.000735

24.3497

6.00E-10

ML

5

0.003638

0.000728

24.1067

0.00E+00

input*ML

10

0.001377

0.000138

4.5607

1.06E-05

Residuals

162

0.00489

3.02E-05

   

Total

179

0.011375

6.35E-05

   

CV

 

9.55%

 

 

 

 

Brix

r

 

DF

SS

MS

Fc

Pr>Fc

input

2

1.8991

0.94955

167.438

0.00E+00

ML

5

4.7377

0.94754

167.079

0.00E+00

input*ML

10

1.5361

0.15361

27.087

8.52E-30

Residuals

162

0.9187

0.005671

   

Total

179

9.0917

0.050792

   

CV

 

35.61%

     

 

MAE

 

DF

SS

MS

Fc

Pr>Fc

input

2

0.5401

0.27005

37.307

4.71E-14

ML

5

2.0861

0.41722

57.642

0.00E+00

input*ML

10

0.9448

0.09448

13.054

1.38E-16

Residuals

162

1.1726

0.007238

   

Total

179

4.7436

0.026501

   

CV

 

7.99%

 

 

 

 

Ratio:

r

 

DF

SS

MS

Fc

Pr>Fc

input

2

3.3465

1.67325

292.129

0.00E+00

ML

5

3.7802

0.75604

131.994

0.00E+00

input*ML

10

1.2781

0.12781

22.314

8.12E-26

Residuals

162

0.9279

0.005728

   

Total

179

9.3326

0.052137

   

CV

 

28.5%

     

 

MAE

 

DF

SS

MS

Fc

Pr>Fc

input

2

8.067

4.0335

69.25

0.00E+00

ML

5

13.398

2.6796

46.004

0.00E+00

input*ML

10

6.716

0.6716

11.53

8.02E-15

Residuals

162

9.436

0.058247

   

Total

179

37.617

0.210151

   

CV

 

8.18%

 

 

 

 

Daidzein:

r

 

DF

SS

MS

Fc

Pr>Fc

input

2

0.9974

0.4987

61.97

1.03E-20

ML

5

6.1802

1.23604

153.588

0.00E+00

input*ML

10

1.1245

0.11245

13.972

1.31E-17

Residuals

162

1.3037

0.008048

   

Total

179

9.6059

0.053664

   

CV

 

39.16%

     

 

MAE

 

DF

SS

MS

Fc

Pr>Fc

input

2

25555

12777.5

25.518

2.32E-10

ML

5

395271

79054.2

157.881

0.00E+00

input*ML

10

83790

8379

16.734

0.00E+00

Residuals

162

81117

500.7222

   

Total

179

585733

3272.251

   

CV

 

8.78%

 

 

 

 

Daidzin:

r

 

DF

SS

MS

Fc

Pr>Fc

input

2

0.2879

0.14395

23.548

1.10E-09

ML

5

3.6022

0.72044

117.836

0.00E+00

input*ML

10

0.3012

0.03012

4.927

3.22E-06

Residuals

162

0.9905

0.006114

   

Total

179

5.1819

0.028949

   

CV

 

42.37%

     

 

MAE

 

DF

SS

MS

Fc

Pr>Fc

input

2

894117

447058.5

26.153

0.00E+00

ML

5

5432174

1086435

63.556

0

input*ML

10

480491

48049.1

2.811

0.0030696

Residuals

162

2769240

17094.07

   

Total

179

9576023

53497.34

   

CV

 

8.03%

 

 

 

 

Genistin:

r

 

DF

SS

MS

Fc

Pr>Fc

input

2

0.472

0.236

35.611

1.52E-13

ML

5

6.8074

1.36148

205.417

0.00E+00

input*ML

10

0.6276

0.06276

9.47

2.62E-12

Residuals

162

1.0737

0.006628

   

Total

179

8.9808

0.050172

   

CV

 

34.44%

     

 

MAE

 

DF

SS

MS

Fc

Pr>Fc

input

2

175125

87562.5

3.672

0.027558

ML

5

6934556

1386911

58.165

0

input*ML

10

566126

56612.6

2.374

0.011936

Residuals

162

3862785

23844.35

   

Total

179

11538592

64461.41

   

CV

 

13.8%

 

 

 

 

 

  1. Please explain the presence of Pearson correlation coefficients (r) 1.24, 1.36 and 1.67 in Table 2. It is well known that Pearson correlation coefficients (r) cannot be greater than 1. The presence of such results casts doubt on all numerical data presented in the manuscript.

A: We would also like to clarify that there are no Pearson correlation coefficients (r) greater than 1 in the ANOVA tables. The numbers presented there correspond exclusively to the sums of squares, and not to the coefficient r itself. The acronym “r”, as well as “MAE”, used in these tables of the manuscript refers to the mean square (MS) values ​​for each variable analyzed (such as ratio, acidity, Brix and flavonoids). The correlation coefficients and mean absolute error (MAE) values ​​are duly presented in the tables intended for comparison of means.

We hope that these clarifications address the observation made.

 

  1. Despite what you write about the presence of ‘Sum of Squares (SumSq); Mean Square (MeanSq)’ in Table 2, unfortunately I did not find these statistics in the table.

 

A: We appreciate the comment and would like to clarify that Table 2 presents the main information from the analysis of variance (ANOVA), including the mean squares, degrees of freedom and the statistical significance of the F test. The authors consider that these data are sufficient for the interpretation of the results, according to the objectives of the study, which is why we chose not to include the table with the raw ANOVA in the main body of the article, since only the sum of squares is missing, which does not interfere with the interpretation of the results. However, its presence is stated in the response letter together with the raw ANOVA of all the variables evaluated in the manuscript.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

In Section 2.5, although the word was changed from figures to tables, the description of ggplot2 package for plotting still exists. Please be careful, I do not know the relationship between the table and the ggplot2 package.

 

Author Response

C: In Section 2.5, although the word was changed from figures to tables, the description of ggplot2 package for plotting still exists. Please be careful, I do not know the relationship between the table and the ggplot2 package.

A: We appreciate the correction, the package used was only ExpeDes for the evaluation of the comparison of averages and this was adjusted in the text. Please see page 5, line 180. Thank you.

Round 4

Reviewer 1 Report

Comments and Suggestions for Authors

Please provide responses to my comments regarding Table 2.

Author Response

C: Please provide responses to my comments regarding Table 2.

A:  Dear reviewer, thank you again for your question. Table 2 contains the summary of the analysis of variance for the variables Pearson correlation between the observed values ​​and those predicted by the ML models (abbreviation in Table 2 - r) and mean absolute error between the observed values ​​and those predicted by the ML models (abbreviation in Table 2 - MAE) for each variable evaluated (Acidity, Brix, Ratio, Daidzen, Daidzin and Ginistin).

Therefore, the values ​​below r do not refer to the Pearson correlation coefficient itself but to the mean square of its ANOVA. The same is true for the values ​​below MAE in Table 2 (they are mean square values).

The values ​​of r and MAE obtained are presented in the other tables of the manuscript and not in Table 2. Therefore, there are values ​​for r greater than 1 in this Table, since they are mean square values ​​of the ANOVA.

This ANOVA Summary presentation model has already been used by us in dozens of publications. Below is the link to the most recent one, which we published in 2025:

https://doi.org/10.1016/j.rsase.2025.101522

Note that in this manuscript the Table with the ANOVA summary has a value greater than 1.0 for the variable r, because as already explained, in this Table the value refers to the mean square.

We hope we have been clear regarding your questions and we are available for any further clarifications.

Author Response File: Author Response.pdf

Round 5

Reviewer 1 Report

Comments and Suggestions for Authors

Dear Authors!

Thank you for answering my questions, I have no more questions about the manuscript.

Back to TopTop