Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Machine Learning Classification of Roasted Arabic Coffee: Integrating Color, Chemical Compositions, and Antioxidants

Sustainability 2023, 15(15), 11561; https://doi.org/10.3390/su151511561

by Eman S. Alamri^1,*

, Ghada A. Altarawneh²

, Hala M. Bayomy^1,3

and Ahmad B. Hassanat^4,*

Reviewer 1:

Alex Khang PH

Reviewer 2:

Inam Ullah Khan

Reviewer 3:

Aidel Salih

Reviewer 4:

Seyed Masoud Rezaeijo

Sustainability 2023, 15(15), 11561; https://doi.org/10.3390/su151511561

Submission received: 3 June 2023 / Revised: 20 July 2023 / Accepted: 23 July 2023 / Published: 26 July 2023

(This article belongs to the Special Issue Latest Applications of Computer Vision and Machine Learning Techniques for Smart Sustainability)

Round 1

Reviewer 1 Report

It is a good chapter and it is ready to publish.

Author Response

Authors Response:

Thank you for your positive feedback on the paper. We appreciate your assessment that it is ready for publication. We have put considerable effort into ensuring the quality and readiness of the chapter, addressing the reviewer's comments and incorporating relevant information. We believe that it presents valuable insights and contributes to the existing literature on the topic. We will proceed with the necessary steps to submit it for publication based on your recommendation. Thank you for your support and guidance throughout the review process.

Reviewer 2 Report

1. title of the paper needs to be modified

2. Add at least 3 to 4 lines about proposed solution in abstract

3. Write major contribution points in the introduction section

4. In the related work section atleast 8 new updated papers need to be added

5. Table 1 and 2 need to be modified and limitation of previous work must be incorporated

6. Figure 1 seems blur try to modify it

7. Add two more samples in table 4

8. Figure 2 need to be well explained

9. Figure 4 and 5 must be updated and explained well

10. Conclusion section need to be summarized

Overall paper need to be proofread properly there are many typo errors

Author Response

title of the paper needs to be modified

Authors Response: We appreciate the reviewer's feedback regarding the title of the paper. After considering the suggestion, we have modified the title to better reflect the scope and focus of our research. The revised title is now:

" Machine Learning Classification of Roasted Arabic Coffee: Integrating Color, Chemical Compositions, and Antioxidants "

Add at least 3 to 4 lines about proposed solution in abstract

Authors Response: Thank you for the valuable feedback from the reviewer. In response to their comment, we have made the following additions to the abstract to provide a clearer overview of our proposed solution:

This study investigates the classification of Arabic coffee into three major variations (light, medium, and dark) using simulated data gathered from actual measurements of color information, antioxidant laboratory testing, and chemical composition tests. The goal is to overcome the restrictions of limited real-world data availability and the high costs involved with laboratory testing. The Monte Carlo approach is used to generate new samples for each type of Arabic coffee using the mean values and standard deviations of publically available data. Using this simulated data, multiple machine learning algorithms are used to classify Arabic coffee, while also investigating the importance of features in identifying the key chemical components. The findings emphasize the importance of color information in accurately recognizing Arabic coffee types. However, depending purely on antioxidant information results in poor classification accuracy due to increased data complexity and classifier variability. The chemical composition information, on the other hand, has exceptional discriminatory power, allowing faultless classification on its own. Notably, particular characteristics like crude protein and crude fiber show high relationships and play an important role in coffee type classification. Based on these findings, it is suggested that a mobile application be developed that uses image recognition to examine coffee color while also providing chemical composition information. End users, especially consumers, would be able to make informed judgments regarding their coffee preferences.

Write major contribution points in the introduction section

Authors Response: We appreciate the reviewer's feedback on highlighting the major contribution points in the introduction section. Taking into account the provided information, we have incorporated the major findings into the introduction section as follows:

The contributions of this study can be summarized as follows:

1- According to the literature, the majority of coffee sample classification methods presented are based on shape, size, color, infrared spectroscopy, and/or aroma. To the best of our knowledge, none of them have addressed the chemical component classification of Arabic coffee samples.

2- Our study made some notable discoveries after conducting a large number of experiments:

Color information alone was critical in correctly identifying Arabic coffee into three categories. The wide range of CIE color values and their high association with coffee classes all contributed to perfect classification results.
It was discovered that antioxidant information alone was insufficient for appropriate coffee classification. Using only antioxidant information resulted in a considerable decrease in classification performance.
Chemical composition data exhibited significant discrimination power in identifying Arabic coffee kinds. The use of chemical composition features alone resulted in flawless categorization, emphasizing their importance and separability among coffee categories. Specific characteristics, such as crude protein and crude fiber, showed high relationships, emphasizing their importance in the classification process.
As data complexity increased, the classification performance of antioxidant information and the choice of classifier became increasingly important. The overlapping curves and complex decision tree rules highlighted the challenges of dealing with complex data and underscored the need of selecting proper classifiers.

These significant findings contribute to a better knowledge of Arabic coffee classification and emphasize the importance of color information, chemical composition, and the issues connected with antioxidant data in reaching correct classification results.

In the related work section at least 8 new updated papers need to be added

Authors Response: Thank you for your valuable feedback. We appreciate your suggestion to update the related work section with recent studies. In response to your comment, we have revised the related work section and included eight new studies that were published in 2022 or 2023. These studies contribute to the current understanding of coffee classification, encompassing various aspects such as shape, size, color, infrared spectroscopy, and aroma. The addition of these recent papers enhances the comprehensiveness of our literature review and provides up-to-date insights into the field. We believe that incorporating these new studies strengthens the overall validity and relevance of our research.

Table 1 and 2 need to be modified and limitation of previous work must be incorporated

Authors Response: Thank you for your valuable feedback. We would like to address your concerns regarding the modification of Tables 1 and 2, as well as the limitations of previous work.

Firstly, Tables 1 and 2 present the original public data reported by a cited study conducted by the first and third authors. As such, we are unable to modify the contents of these tables. However, we have modified the captions of these tables to elaborate this fact, and we have ensured that the data is accurately represented in our paper, and we believe that it provides a reliable foundation for our research.

Regarding the limitation of previous work, we improved and added them at the end of the related work section as follows:

Based on the previous literature review, it can be noticed that most classification methods for coffee samples are based on shape, size, color, infrared spectroscopy, and/or aroma. To the best of our knowledge, none of these studies have particularly addressed the classification of Arabic coffee samples based on their chemical components. This study aims to bridge that gap by focusing on the chemical composition of Arabic coffee and its classification potential. Hoping to improve understanding of this key feature of coffee classification by investigating the specific role of chemical components in classifying Arabic coffee varietals.

Figure 1 seems blur try to modify it

Authors Response: Thank you for your comment regarding Figure 1. We appreciate your feedback and have taken steps to address the concern.

In response to your suggestion, we have enhanced the resolution of Figure 1 to 320 dpi and converted it to PDF format to ensure optimal quality. These modifications have significantly improved the clarity and readability of the figure.

We understand that Figure 1 contains multiple subfigures, which may appear unclear when viewed at its original size. However, we would like to emphasize that when the figure is resized or zoomed in by the reader, it becomes completely readable and provides the necessary details for understanding the presented information.

Add two more samples in table 4

Authors Response: Thank you for your feedback regarding Table 4. We appreciate your suggestion to add two more samples to enhance the representation of the simulated data.

After carefully considering your comment and conducting additional experiments, we have decided to maintain the original simulated data presented in Table 4. We added two additional samples for each coffee class, resulting in a total of 1005 samples (1002 simulated samples and 3 actual samples representing Light, Medium, and Dark coffee). However, upon analyzing the statistical characteristics of the new simulated data, we found that there were no significant changes compared to the original data.

Given this outcome, and to avoid unnecessary repetition of all the experiments, we believe it is more appropriate to maintain the original simulated data, as the statistics reported in the original Table 4 accurately reflect the characteristics of the simulated samples. By doing so, we ensure consistency in our analysis and maintain the integrity of our findings.

We appreciate your understanding in this matter, and we believe that the decision to stick with the original simulated data will not affect the validity and reliability of our results.

Figure 2 need to be well explained

Authors Response: Thank you for your valuable feedback regarding Figure 2. We appreciate your suggestion to provide further insights into the correlation values and their strength, we have added the following:

Figure 2 depicts the Pearson correlation analysis results between every feature and the Class variable, which represents the three coffee varieties. The correlation values provide essential information about the relationship between the features and coffee type classification.

It is critical to consider the strength of the correlation when analyzing the correlation values. A large correlation indicates a substantial relationship between the feature and the coffee types, whilst a moderate correlation shows a less significant relationship. The correlation coefficient (r) has a magnitude ranging from -1 to 1, with values closer to -1 or 1 indicating higher correlations and values closer to 0 indicating a weaker correlation.

In Figure 2, we can see that Crude Protein has a strong negative association (-0.91) with the coffee kinds. This strong inverse relationship means that as the Crude Protein level grows, the likelihood of the coffee being of a given variety diminishes. Crude Fiber, on the other hand, has a positive correlation of 0.80, indicating that higher Crude Fiber readings are connected with a greater likelihood of the coffee belonging to a specific type.

Similarly, color-related measures such as a*, Browning Index, and L* have moderate to substantial relationships with coffee kinds (0.85, 0.84, and -0.81, respectively). These correlations imply that these color-related features play an important role in the categorization process, as they capture the specific color characteristics of each coffee type.

In conclusion, the correlation values in Figure 2 reveal the intensity and direction of the correlation between the features and the coffee types. Strong correlations imply features that are extremely influential in determining the coffee type, whereas lesser correlations exhibit links that are less significant. We may acquire an improved understanding of the importance of the features in the classification process by taking these correlation values into account.

Figure 4 and 5 must be updated and explained well

Authors Response: Thank you for your valuable feedback regarding Figure 4 and Figure 5. We appreciate your suggestion to update and provide a better explanation for these figures. We have taken your feedback into consideration and made the necessary improvements to both Figure 4 and Figure 5. We have doubled the resolution of the figures to enhance their clarity and converted them to PDF format for better quality.

Moreover, we have improved the explanation of both figures as follows:

Figure 4 depicts the link between crude fiber\% and crude protein for all 1STD simulated data points. This graph depicts the relationship between both of these features and their distribution among different coffee types. We can see the patterns and separability of the data points in Figure 4, revealing the discriminative power of crude fiber\% and crude protein when classifying the coffee types. On the other hand, Figure 5 depicts the same features (crude fiber\% and crude protein) but with 2STD simulated data, which creates greater complexity and overlapping among the data points. Figure 5's overlapping data and growing complexity demonstrate the difficulties that come with dealing with complicated data. This underscores the significance of proper classifier selection when dealing with such complex data, as different classifiers' performance may vary as data complexity increases.

Conclusion section need to be summarized

Authors Response: We appreciate the reviewer's feedback regarding the conclusion section. Based on the reviewer's suggestion, we have added a summary at the end of the conclusion section as follows:

Future studies should look into factors other than color, antioxidants, and chemical composition. Incorporating odor profiles, origin information, and processing methods may increase classification accuracy and provide a more complete understanding of coffee differences. To solve shortcomings, Future studies might use more diverse and representative datasets, experiment with other classifiers, and go beyond Arabic coffee types to improve generalizability. They may concentrate on these factors, as well as the development of the mobile application for practical usage. Overall, this research adds to the field of coffee classification, provides practical insights, and identifies areas for further research and advancement.

Comments on the Quality of English Language

Overall paper need to be proofread properly there are many typo errors

Authors Response:

Thank you for bringing this concern to our attention. We apologize for any typos or errors that may have been present in the paper. To address this issue, we have taken the necessary steps to ensure proper proofreading and editing. We engaged a native English speaker with expertise in scientific writing to thoroughly review the paper and correct any grammatical, typographical, or stylistic errors that were identified. We acknowledge the importance of presenting a well-polished manuscript and appreciate the reviewer's feedback in this regard. By employing the services of a professional proofreader, we have made every effort to improve the overall quality of the paper and ensure that it adheres to high standards of written English.

Reviewer 3 Report

This paper evaluates the classification of Arabic coffee in its three primary variations on the basis of simulated data on several characteristics as well as chemical composition tests. Multiple classifiers were used for the recommended classification. Two types of data were simulated using the Monte Carlo approach to determine standard deviations from real measurements.
The use of classifiers consists in prioritizing information on antioxidants and its effects on this classification, which accentuate the relevance of this classification with variable performance.
The authors used a Monte Carlo simulation based on standard deviations of real measurements, with classifiers recognized for obtaining important information such as color or antioxidants.
The results enabled identification of the type of coffee, based on characteristics such as color required for classification and antioxidant levels.
The references are rich and adapted to the requirements imposed by the pertinence of the subject
The authors themselves acknowledge that, as a matter of scientific ethics, they were confronted with multiple problems in carrying out their research, such as the limitation of certain parameters influencing the accuracy of classification. As a result, the parameters studied are limited, and it is necessary for the authors to study other parameters to flesh out their research and enrich their results.
Another problem that weighs on the relevance of the article is the use of certain ready-to-use classifiers, although different algorithms or other ensemble approaches may produce better or at least different results. I think the authors would be well advised to look for other ways of obtaining more substantial results, so as to enrich the article with other, more reliable methods.

Author Response

This paper evaluates the classification of Arabic coffee in its three primary variations on the basis of simulated data on several characteristics as well as chemical composition tests. Multiple classifiers were used for the recommended classification. Two types of data were simulated using the Monte Carlo approach to determine standard deviations from real measurements.
The use of classifiers consists in prioritizing information on antioxidants and its effects on this classification, which accentuate the relevance of this classification with variable performance.
The authors used a Monte Carlo simulation based on standard deviations of real measurements, with classifiers recognized for obtaining important information such as color or antioxidants.
The results enabled identification of the type of coffee, based on characteristics such as color required for classification and antioxidant levels.
The references are rich and adapted to the requirements imposed by the pertinence of the subject

Authors' response: We sincerely appreciate your positive feedback and comments regarding our research on the classification of Arabic coffee. We are glad that you found our approach and the use of multiple classifiers relevant and informative.

We are delighted that our results successfully identified the type of coffee based on key characteristics, including color and antioxidant levels. These findings contribute to a better understanding of the classification process and highlight the importance of these factors in distinguishing different types of Arabic coffee. Furthermore, we appreciate your acknowledgement of the richness and appropriateness of the references cited in our paper. We made every effort to ensure that our references meet the requirements and provide valuable insights into the subject matter. Thank you once again for your valuable feedback and positive assessment of our research. Your comments are encouraging and greatly appreciated.

The authors themselves acknowledge that, as a matter of scientific ethics, they were confronted with multiple problems in carrying out their research, such as the limitation of certain parameters influencing the accuracy of classification. As a result, the parameters studied are limited, and it is necessary for the authors to study other parameters to flesh out their research and enrich their results.

Authors' response: We sincerely appreciate your insightful comments and concerns regarding the limitations of our research. As you correctly mentioned, we focused on a specific set of parameters based on the recent studies conducted by the first and third authors. We acknowledge that this limitation restricts the scope of our work and we agree that exploring additional parameters would further enrich our research and enhance the accuracy of the classification.

One of the limitations we acknowledged in our study was the concentration on color, antioxidant, and chemical composition information. While these parameters have proven to be valuable in coffee classification, we recognize the potential benefits of investigating other factors such as scent profiles, geographical origin, and processing processes. By incorporating a wider range of characteristics, we can gain a more comprehensive understanding of the various factors that influence coffee classification.

We appreciate your suggestion to target more parameters in future research, and we fully intend to address this limitation in our future investigations. By considering a broader set of features, we aim to improve the accuracy and reliability of our classification models, thus advancing the knowledge and understanding of coffee classification.

Another problem that weighs on the relevance of the article is the use of certain ready-to-use classifiers, although different algorithms or other ensemble approaches may produce better or at least different results. I think the authors would be well advised to look for other ways of obtaining more substantial results, so as to enrich the article with other, more reliable methods.

Authors response: Thank you for your valuable feedback and suggestions regarding our article. We appreciate your attention to the relevance and robustness of our research. In response to your comment, we conducted additional experiments using the Adaboost ensemble classifier and the Fuzzy Lattice Reasoning Classifier (FLR), as you recommended. However, we found that the results obtained from these classifiers were either modest or comparable to the results achieved by the classifiers we initially employed, considering the features used in our study.

We acknowledge the importance of exploring various methods and algorithms to enhance the comprehensiveness and reliability of our findings. However, based on our extensive experimentation, the classifiers we selected initially demonstrated superior performance in accurately classifying Arabic coffee samples. We carefully considered the strengths and weaknesses of different classifiers, and we chose the ones that provided the most reliable and consistent results for our specific dataset and classification task.

Nonetheless, we appreciate your suggestion and we reported the new results at the end of Tables 5-8, and 10-1. This addition will help provide a comprehensive view of our research process and the considerations we made when choosing the best classifiers.

Reviewer 4 Report

Question 1: How were the simulated data generated and what was the purpose of using simulated data in this research?

Question 2: Can you provide more details about the classifiers that were used for the classification of Arabic coffee? How were these classifiers selected and what were their performance results?

Question 3: Could you explain why the use of antioxidant information resulted in lower classification performance compared to color information? Were there any specific factors contributing to the variability among classifiers in handling the antioxidant data?

Question 4: Could you explain the significance of categorizing Arabic coffee into different roasting degrees (light, medium, or dark) for human consumption? How does each roasting degree affect the coffee's chemical composition and potential health effects?

Question 5: In the research mentioned, the authors used various methodologies to analyze the chemical composition of roasted Arabic coffee. Could you provide more details about the specific techniques used, such as the UV-visible spectrophotometer, gas chromatography, and 1,1-Diphenyl-2-picryl-hydrazy (DPPH) technique? How do these techniques contribute to understanding the coffee's chemical components?

Question 6: The paper mentions that previous studies have classified Arabic coffee and coffee in general using different sources of information and technologies. Could you provide some examples of these studies and the methods they employed? How does the current research differ from those previous studies in terms of its focus on chemical classification of Arabic coffee?

Question 7: How were the morphological characteristics of coffee beans, such as area, perimeter, equivalent diameter, and roundness percentage, extracted from the images in Arboleda et al.'s study? Were any preprocessing steps performed on the images before extracting these characteristics? Add a suitable reference to the site in this section [PMID: 36183055, and PMID= 36185056].

Question 8: In Arboleda's study, which specific data mining algorithms were used for the classification of green coffee beans from different species? How were these classifiers selected, and what criteria were considered to determine the best accuracy achieved by the Coarse Tree Algorithm?

Question 9: Could you provide more details about the Multilayer Perceptron (MLP) Neural Network used in Pizzaia et al.'s computer vision approach for coffee bean classification? How were shape, size, and color utilized as classification criteria, and what were the performance results of this approach?

Question 10: Could you explain how near and mid-infrared spectroscopy techniques are used in assessing coffee quality features? What specific information about the chemical composition and related aspects of coffee can be obtained through these techniques?

Question 11: In the study conducted by Calvinia et al., which sparse classification approaches (sPCA with KNN and sPLS-DA) were compared? How were these approaches utilized to classify Arabica and Robusta coffee species based on near-infrared hyperspectral imaging? Did the sparse methods offer any advantages over conventional methods in terms of interpretability and model complexity? Add suitable reference to the site in this section [PMID: 37238180]

Question 12: In the study by Makimori and Bona, what were the chemometric methods used (ComDim and LDA) for the classification of different commercial instant coffees using an electronic nose (E-nose) equipped with metal-oxide-semiconductor (MOS) sensors? How did the results of the chemometric analysis contribute to the accurate classification of the coffee samples?

Comment 1: The authors mentioned that the coffee beans were acquired from supermarkets in Tabuk City, Saudi Arabia, and roasted using drum roasters. It would be helpful to provide additional details regarding the specific brand or type of drum roasters used in the study. This information can contribute to the transparency and replicability of the research.

Comment 2: The authors justified the use of simulated data due to the high cost of laboratory testing for a large number of coffee samples. While simulation data can be a reasonable alternative, it is important to acknowledge the limitations and potential biases associated with this approach. It would be beneficial if the authors discussed any potential deviations or uncertainties introduced by using simulated data instead of real-world samples, particularly regarding the accuracy of reflecting the underlying distribution of the coffee samples and their chemical features.

Comment 3: The authors briefly mentioned that Table 4 presents the statistical characteristics of the simulated data and Figure 1 depicts the distribution of each simulated feature. However, it would be useful to provide a summary or key insights from Table 4 to understand the statistical properties of the simulated data. Additionally, a more detailed explanation of Figure 1, such as the axes labels and any observed patterns or variations in the feature distributions, would enhance the understanding of the data used in the study.

Besides the language, the paper needs improvement.

Author Response

Question 1: How were the simulated data generated and what was the purpose of using simulated data in this research?

Authors Response: Thank you for your question regarding the generation of simulated data in our research and the purpose behind its usage. We apologize for any confusion caused by the lack of detailed explanation in the initial version of the paper. We have taken your feedback into consideration and have provided additional clarification in the revised version of the manuscript. To generate the simulated data, we employed the Monte Carlo approach. This method allowed us to create new samples for each type of Arabic coffee using the mean values and standard deviations obtained from publicly available data. Specifically, the means and standard deviations used in the Monte Carlo simulation are reported in Tables 1, 2, and 3 of the revised paper. It is important to note that the data used for generating the simulated samples was originally obtained from a previous study conducted by the first and third authors of this paper. We utilized this data to simulate new samples and expand our dataset for the classification task. The purpose of using simulated data in our research was to overcome the limitations of having a very small real dataset for each type of Arabic coffee, actually one example for each. By generating simulated samples, we aimed to augment the dataset and enhance the robustness of our classification models. This allowed us to explore the performance of various classifiers and evaluate the impact of different features on the classification accuracy. We apologize for the initial lack of clarity on this matter and assure you that the revised version of the paper provides a more comprehensive explanation of the data generation process using the Monte Carlo approach.

Question 2: Can you provide more details about the classifiers that were used for the classification of Arabic coffee? How were these classifiers selected and what were their performance results?

Authors Response: Thank you for your inquiry regarding the classifiers used in this study and their performance results. Based on your feedback, we have provided additional details in the revised version of the manuscript to address your questions. In our study, we evaluated several widely used classifiers to determine the most suitable option for the proposed Arabic coffee classification system. The classifiers we examined included Random Forests (RF), Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Naive Bayes (NB), and the decision tree algorithm C4.5. We also incorporated two additional classifiers in this revised version: Fuzzy Lattice Reasoning (FLR) and AdaBoost (AB). The selection of these classifiers was based on their popularity in the field of machine learning and their potential suitability for our classification task. We utilized the Weka3 implementation of these classifiers, which provided efficient and well-established implementations for our study. In terms of the performance evaluation, we used the default parameters for each classifier to ensure a fair and consistent comparison. For instance, K-Nearest Neighbors (KNN) employed K=1 with the Euclidean distance function, Support Vector Machines (SVM) used the libsvm library with a radial basis function kernel, and Random Forests (RF) utilized 100 trees with the number of features=int(\log_2{(\#features + 1)})), AB (Number of iterations=10), and FLR ($\rho=0.5$). The performance results of each classifier, including accuracy, precision, recall, and F1-score, were obtained through rigorous experimental evaluations on the dataset. We assessed the classifiers' abilities to accurately classify Arabic coffee based on the provided features. The revised version of the paper includes more detailed information on the performance results, including comparative analyses and statistical measures to evaluate the significance of the findings.

We appreciate your valuable input, and we have taken your comments into account to enhance the clarity and comprehensiveness of our paper.

Authors Response: Thank you for your question regarding the lower classification performance achieved when using antioxidant information compared to color information in the classification of Arabic coffee.

In our original manuscript, we already provided an explanation for this observation. We stated that relying solely on antioxidant information is inadequate for accurate Arabic coffee classification. Our correlation analysis, as depicted in Figures 2 and 3, supports this finding. The analysis showed that variables such as caffeine percentage and acrylamide concentration are less important in the classification process and exhibit weaker correlations with the dependent variable (coffee class).

Moreover, when examining Figure 1, it becomes evident that some of the curves representing these features for the three types of Arabic coffee overlap significantly. This overlap indicates that the values of these features for different coffee types are not distinct enough to establish clear boundaries or distinctions between the classes.

In the revised version of our paper, we have further emphasized the influence of data complexity and the choice of classifier on the performance of antioxidant information. As the complexity of the data increased, it became increasingly important to consider both the nature of the antioxidant data and the appropriate selection of classifiers. The overlapping data and the complex decision tree rules highlighted the challenges associated with dealing with intricate data and underscored the necessity of choosing suitable classifiers.

Additionally, we explicitly stated that relying solely on antioxidant information led to a notable decrease in classification performance. This highlights the insufficiency of using antioxidant information alone for accurate coffee classification.

Authors Response: Thank you for your question regarding the significance of categorizing Arabic coffee into different roasting degrees (light, medium, or dark) for human consumption and the impact of each roasting degree on the coffee's chemical composition and potential health effects.

We stated and highlighted (in the revised version) the important of the study in the introduction section based on information gained from the literature and cited properly as follows:

a) Coffee consumption has been associated with various beneficial effects on human health due to its biochemical properties. Numerous studies have shown that drinking coffee can improve cognitive functions, including memory, mood, and overall cognitive performance. These effects are particularly relevant for non-communicable diseases.
b) It is important to note that coffee consumption has also been linked to potential health risks. For example, it has been found to increase LDL-C (low-density lipoprotein cholesterol) and total cholesterol levels in the body, which can elevate the risk of cardiovascular disease. Additionally, a study conducted among Saudi females over the age of 40 revealed a significant association between drinking Arabic coffee and an increased risk of osteoporosis.

In the context of our study, the focus was primarily on the classification of Arabic coffee based on its chemical composition and certain characteristics such as color and antioxidant information. Although different roasting degrees (light, medium, or dark) have varying levels of antioxidant components and chemical compositions, the specific impact of each roasting degree on the coffee's chemical composition and potential health effects was beyond the scope of our research.

However, we would like to mention that previous work conducted by the first and third authors, as referenced in our paper, highlighted the benefits of medium Arabic coffee. According to their findings, medium roasting was associated with a higher concentration of chemicals with antioxidant properties and other biologically advantageous effects.

We appreciate your interest in the broader implications of Arabic coffee roasting degrees on human health. While our study did not directly investigate this aspect, we encourage future research to dig deeper into the relationship between roasting degrees, coffee's chemical composition, and their potential health impacts.

Authors Response: Thank you for your question. In this work, we did not conduct any laboratory tests, however, the detailed laboratory analysis of the chemical composition of roasted Arabic coffee was conducted in a previous study by the first and last authors, As our research relied on the data obtained from that study, we would like to refer readers to Alamri 's paper for a comprehensive understanding of the specific techniques used and the associated results.

Therefore, we added the following paragraph to the revised version:

“For a deeper understanding of the coffee's chemical components analysis, it is important to refer readers to the recent work of Alamri et al. \cite{iman}, which provides extensive details on the chemical analysis of roasted Arabic coffee, including caffeine determination using a UV-visible spectrophotometer, acrylamide measurement, free radical scavenging capacity assessment using the DPPH technique, browning index calculation, color measurements, and volatile compound estimation using gas chromatography-mass spectrometry.”

We appreciate your understanding regarding the omission of the extensive detailed descriptions of the laboratory techniques in our manuscript. By referring readers to Alamri 's paper, we ensure that they have access to the comprehensive information and methodologies related to the chemical analysis of roasted Arabic coffee.

Authors Response: Thank you for your comment. In our old and revised version, we have dedicated a section to reviewing previous studies on the classification of Arabic coffee and coffee in general. We have organized these studies into three categories: computer vision, infrared spectroscopy, and electronic nose. In each category, we provide examples of studies that employed different sources of information and technologies to classify coffee samples.

For example, in the computer vision category, studies have utilized image processing techniques to analyze the shape, size, and color of coffee beans or ground coffee samples. In the infrared spectroscopy category, studies have focused on using infrared spectroscopic analysis to identify characteristic spectral patterns related to the chemical composition of coffee samples. In the electronic nose category, studies have employed electronic nose devices to detect and classify coffee aroma profiles.

However, despite the existing research on coffee classification, in the revised version we have identified a gap in the literature. While previous studies have explored various sources of information and technologies for coffee classification, they have primarily focused on factors such as shape, size, color, infrared spectroscopy, and aroma. To our knowledge, none of these studies have specifically addressed the classification of Arabic coffee samples based on their chemical components.

Therefore, our current research aims to bridge this gap by focusing on the chemical composition of Arabic coffee and its potential for classification. We believe that understanding the specific role of chemical components in classifying Arabic coffee varietals is crucial for advancing the knowledge in this field.

Thank you for your valuable feedback, which has helped us highlight the novelty and significance of our research in relation to previous studies.

Authors Response: Thank you for your comment. In Arboleda et al.'s study, the morphological characteristics of coffee beans, such as area, perimeter, equivalent diameter, and roundness percentage, were extracted from the images using hand-crafted features available in the MATLAB image processing toolbox. A computer routine algorithm was developed to preprocess the coffee samples images and extract these features.

However, it is worth noting that recent advancements in computer vision have demonstrated that deep learning-based features tend to outperform hand-crafted features in various tasks, including image analysis, where they have shown superior capabilities in capturing complex patterns and representations in images. These deep learning approaches have been successfully applied in a wide range of computer vision applications, including object recognition and classification.

We acknowledge the potential benefits of incorporating deep features into coffee classification methodologies, as demonstrated in recent studies \cite{PMID:36183055,PMID:36185056}. Thank you for bringing up this point, and we appreciate your suggestion to reference these studies. In the revised version of our manuscript, we will include a suitable reference to these articles to further support the importance of considering deep features in image-based coffee classification.

Authors Response: Thank you for your inquiry. In Arboleda's study, a variety of data mining algorithms were employed for the classification of green coffee beans from different species. Specifically, 22 classifiers from five classifier families were utilized, including Decision Trees, Discriminant Analysis, Support Vector Machines, K-Nearest Neighbor, and Ensemble Classifiers. Among these classifiers, the Coarse Tree Algorithm demonstrated the highest accuracy of 94.1% in the classification task. Additionally, the Coarse Tree Algorithm exhibited the fastest training time compared to the other classifiers considered.

The selection of the Coarse Tree Algorithm was based on its superior performance in terms of accuracy, which is a key criterion for evaluating the effectiveness of a classification model. By achieving the highest accuracy rate, the Coarse Tree Algorithm demonstrated its capability to effectively distinguish between different species of green coffee beans. Thank you for raising this question, and we appreciate the opportunity to provide further clarification on the data mining algorithms used and the selection process of the Coarse Tree Algorithm in Arboleda's study.

Authors Response: Thank you for your question. In Pizzaia et al.'s computer vision approach for coffee bean classification, a Multilayer Perceptron (MLP) Neural Network was employed. The MLP utilized in the study consisted of an input layer, a hidden layer with 100 sigmoid-type neurons, and a binary output layer. The input layer of the MLP consisted of five inputs, which were derived from the shape, size, and color characteristics of the coffee beans. Specifically, the inputs included the area and roundness of each coffee bean, as well as the averages of the red, green, and blue (RGB) color channels.

The MLP was trained using the Levenberg-Marquardt algorithm, a widely used optimization technique for training neural networks. The network was trained to classify coffee beans as either "good" or "defective," with a binary output of "1" representing a good grain and "0" representing a defective grain. In terms of performance, the classification accuracy achieved by the MLP approach was reported to be 94.10%. This indicates that the MLP Neural Network was able to effectively distinguish between good and defective coffee beans based on the provided shape, size, and color criteria.

We appreciate your interest in the details of the MLP Neural Network used in Pizzaia et al.'s study and the performance results obtained from this computer vision approach.

Authors Response: Thank you for your question. Near and mid-infrared spectroscopy techniques are widely used in assessing various quality features of coffee. These techniques involve the analysis of the absorption and reflection of light in the near and mid-infrared regions of the electromagnetic spectrum. Through these techniques, specific information about the chemical composition and related aspects of coffee can be obtained. In the related work section, we reviewed several methods that utilize near and mid-infrared spectroscopy techniques for coffee classification. These approaches include the collection of spectra using Fourier Transform Infrared (FTIR) spectroscopy, near-infrared hyperspectral imaging, and portable near-infrared spectrometers. These methods allow researchers to capture the unique spectral fingerprints of coffee samples, which are influenced by their chemical composition. By analyzing these spectral patterns, various quality features of coffee can be assessed. For example, near and mid-infrared spectroscopy techniques can provide information about the levels of compounds such as caffeine, organic acids, sugars, and other chemical components that contribute to the flavor, aroma, and overall quality of coffee. The collected spectra are then processed and analyzed using specific classifiers or statistical models to classify coffee samples based on their quality attributes. The choice of classifier or model depends on the specific study and its objectives.

Authors Response: Thank you for your question. In the study conducted by Calvinia et al., two sparse classification approaches, namely sparse Principal Component Analysis with K-Nearest Neighbors (sPCA with KNN) and sparse Partial Least Squares Discriminant Analysis (sPLS-DA), were compared for the classification of Arabica and Robusta coffee species based on near-infrared hyperspectral imaging. The near-infrared hyperspectral imaging technique was used to capture the spectral information of green coffee samples. The average spectra from each hyperspectral image were then utilized to build training and test sets for the classification task. The sparse classification approaches, sPCA with KNN and sPLS-DA, were employed to classify the coffee species based on the hyperspectral data. These sparse methods offer advantages in terms of interpretability and model complexity. They aim to identify a subset of relevant spectral regions or features that contribute the most to the classification task, resulting in more interpretable and parsimonious models. In Calvinia et al’s study, it was observed that the sparse methods, sPCA with KNN and sPLS-DA, yielded similar results to the classical methods. However, the advantage of the sparse approaches was their ability to provide more interpretable models by selecting specific spectral regions that are most informative for the classification task. This feature selection process in the sparse methods converged to the selection of the same spectral regions, indicating their consistency and relevance in distinguishing between Arabica and Robusta coffee species.

For more detailed information and a comprehensive understanding of the feature selection methods, we refer the reader to the study [PMID: 37238180].

Authors Response: Thank you for your question. In the study conducted by Makimori and Bona, the classification of different commercial instant coffees using an electronic nose (E-nose) equipped with metal-oxide-semiconductor (MOS) sensors was performed using chemometric methods, specifically common dimension analysis (ComDim) and linear discriminant analysis (LDA). The E-nose, equipped with seven MOS sensors, was utilized to analyze 53 samples of six distinct commercial instant coffees produced by the same industry. ComDim, which is an unsupervised multiblock analysis method, was applied to reduce the dimensionality of the E-nose sensor data. The first derivative of the transitory signal was used to construct each sensor's block, and through ComDim analysis, four common dimensions (CDs) were identified. These four CDs captured 99.86% of the total variation in the E-nose data. Salience tables were generated to reveal the relationships between the sensors within each CD. The tables indicated the connections between sensors S1, S3, S5, S6, and S8 within CD1, while sensors S7 and S9 had a higher influence on CD2. The scores obtained from the first four CDs were used as input for building LDA classifiers. The LDA models demonstrated excellent performance, achieving 100% sensitivity and specificity using leave-one-out cross-validation. This indicates that the chemometric analysis, incorporating ComDim and LDA, accurately classified the different coffee samples studied. By applying ComDim to reduce the dimensionality of the E-nose sensor data and utilizing LDA for classification, the study achieved reliable and accurate classification of the commercial instant coffees. The chemometric methods helped extract the essential information from the sensor data and identify relevant features for discriminating between different coffee samples.

Authors Response: Thank you for your comment. We appreciate your suggestion to provide additional details about the specific brand or type of drum roasters used for roasting Arabic coffee for the transparency and replicability of the research. We apologize for any ambiguity in our previous description. Actually, drum roasters used for roasting Arabic coffee used by Alamri et al. study, who utilized commercially available drum roasters, which are widely recognized and commonly used in the coffee industry. Unfortunately, they did not specify the brand or model of the drum roasters employed in their research.

Furthermore, we would like to clarify that after roasting, the coffee beans were well-ground using a coffee grinder with the model GVX212 from Krupps, located in Essen, Germany. The grinder utilized a screen size of 0.30 mm, as reported by Alamri et. al.

Authors Response: In response to the reviewer's remarkable comment, we acknowledge the limitation of using simulated data instead of real-world samples in our study. We recognize that there may be deviations and uncertainties introduced by this approach, particularly in terms of accurately reflecting the underlying distribution of the coffee samples and their chemical features.

While simulated data can provide a reasonable alternative when laboratory testing for a large number of coffee samples is cost-prohibitive, we understand the importance of addressing potential biases associated with this approach. In our future work, we will make an effort to discuss these limitations in more detail and explore methods to mitigate any deviations or uncertainties introduced by using simulated data. This will help to improve the transparency and reliability of our research.

Authors Response: Thank you for your comment. We appreciate your suggestion regarding Table 4 and Figure 1. Which we summarized and enhanced in the revised version. In Table 4, we provided a summary of the statistical characteristics of the simulated data used in the study. The table includes information for each class (Light coffee, Medium coffee, and Dark coffee) and their respective features. The features include MoistureContent, EtherExtract, CrudeProtein, CrudeFiber, AshContent, NFE, CaffeineContent, Acrylamide, DPPH, Browning Index, Lcolor, aColor, and bColor. For each feature, we provided the minimum, maximum, and mean values across the different classes.

Regarding Figure 1, we have enhanced its resolution for better clarity. The graph illustrates the distribution of each simulated feature. The x-axis represents the values of each feature, and the y-axis represents the frequency of occurrence for each class. By examining the graph, one can observe any patterns or variations in the distribution of the features, which can provide insights into the characteristics of the simulated data. And we used it to justify the results where needed.

Comments on the Quality of English Language

Besides the language, the paper needs improvement.

Authors Response: Thank you for bringing this concern to our attention. We apologize for any typos or errors that may have been present in the paper. To address this issue, we have taken the necessary steps to ensure proper proofreading and editing. We engaged a native English speaker with expertise in scientific writing to thoroughly review the paper and correct any grammatical, typographical, or stylistic errors that were identified. We acknowledge the importance of presenting a well-polished manuscript and appreciate the reviewer's feedback in this regard. By employing the services of a professional proofreader, we have made every effort to improve the overall quality of the paper and ensure that it adheres to high standards of written English

Round 2

Reviewer 3 Report

The following points of my expertise have not been adequately addressed by the authors. I therefore reject the article:

The authors themselves acknowledge that, as a matter of scientific ethics, they were confronted with multiple problems in carrying out their research, such as the limitation of certain parameters influencing the accuracy of classification. As a result, the parameters studied are limited, and it is necessary for the authors to study other parameters to flesh out their research and enrich their results.
Another problem that weighs on the relevance of the article is the use of certain ready-to-use classifiers, although different algorithms or other ensemble approaches may produce better or at least different results. I think the authors would be well advised to look for other ways of obtaining more substantial results, so as to enrich the article with other, more reliable methods.

Author Response

Reviewer 3:

The following points of my expertise have not been adequately addressed by the authors. I therefore reject the article:

The authors themselves acknowledge that, as a matter of scientific ethics, they were confronted with multiple problems in carrying out their research, such as the limitation of certain parameters influencing the accuracy of classification. As a result, the parameters studied are limited, and it is necessary for the authors to study other parameters to flesh out their research and enrich their results.

Authors' Response: Thank you for your valuable feedback and evaluation of our article. We appreciate your thorough examination of the research and your expertise in the field. We understand your concerns regarding the limitations in our research and the need to explore other parameters to enrich our results.

Indeed, during our investigation, we encountered various challenges related to scientific ethics, especially when dealing with real-world samples for Arabic coffee. As we acknowledged in the paper, we utilized simulated data based on the mean and standard deviation of real laboratory tests. While this approach allowed us to work with the available data and explore classification techniques, we recognize that it may introduce variations and shortcomings compared to directly using real-world samples.

The limitation of using simulated data is highlighted in the conclusion section of our paper. We explicitly state that the use of simulated data may not perfectly capture the underlying distribution of coffee samples and their chemical properties. To address this limitation, we express our commitment to future research that will delve deeper into these constraints and explore techniques to mitigate any deviations or uncertainties arising from using simulated data.

Furthermore, we fully acknowledge your suggestion to investigate other parameters beyond color, antioxidants, and chemical composition. Incorporating additional factors such as odor profiles, origin information, and processing methods is undoubtedly crucial for gaining a more comprehensive understanding of the differences in coffee varieties. These factors may also enhance classification accuracy and contribute to a more robust and meaningful analysis of Arabic coffee.

Your feedback is invaluable to us, and we appreciate your thorough review. We will take your suggestions into account for future studies, and we are committed to addressing the limitations of our research to ensure the accuracy and validity of our findings.

Once again, we thank you for your time and valuable input. Your constructive comments will undoubtedly contribute to the improvement of our research and its future directions.

2. Another problem that weighs on the relevance of the article is the use of certain ready-to-use classifiers, although different algorithms or other ensemble approaches may produce better or at least different results. I think the authors would be well advised to look for other ways of obtaining more substantial results, so as to enrich the article with other, more reliable methods.

Authors' Response: Thank you for your insightful comments and suggestions regarding the use of different classifiers in our research. We greatly appreciate your expertise and your emphasis on exploring other methods to enrich the article with more reliable results.

In response to your recommendation, we conducted additional experiments using the Adaboost ensemble classifier and the Fuzzy Lattice Reasoning Classifier (FLR). We were keen on evaluating the potential benefits of these classifiers and their ability to provide more substantial outcomes. However, after careful analysis, we found that the results obtained from these classifiers were either modest or comparable to the results achieved by the classifiers we initially employed, given the specific features used in our study.

Despite this, we acknowledge the significance of exploring various methods and algorithms to improve the comprehensiveness and reliability of our findings. As you rightly pointed out, the diversity of approaches plays a pivotal role in ensuring the robustness of research outcomes.

In light of this, we explicitly mentioned in the limitations of our study that future research should consider utilizing more diverse and representative datasets. Additionally, we recognize the importance of experimenting with other classifiers beyond the ones explored in this study. Such endeavors will undoubtedly contribute to further improving the generalizability and validity of our results.

Once again, we are grateful for your thoughtful evaluation and recommendations. Your feedback has undoubtedly contributed to the refinement and enhancement of our research. We are committed to addressing the limitations and exploring new avenues to ensure the reliability and significance of our work.

Article Menu

Machine Learning Classification of Roasted Arabic Coffee: Integrating Color, Chemical Compositions, and Antioxidants

Further Information

Guidelines

MDPI Initiatives

Follow MDPI