A Study of the Physical Characteristics and Defects of Green Coffee Beans That Influence the Sensory Notes Using Machine Learning Models

Gonzalez-Sanchez, Blanca; Sandoval-Gonzalez, Oscar; Flores-Cuautle, Jose de Jesus Agustin; Landeta-Escamilla, Ofelia; Portillo-Rodriguez, Otniel; Aguila-Rodriguez, Gerardo

doi:10.3390/pr12010018

Open AccessArticle

A Study of the Physical Characteristics and Defects of Green Coffee Beans That Influence the Sensory Notes Using Machine Learning Models

¹

Tecnologico Nacional de Mexico, Instituto Tecnologico de Orizaba, Orizaba 94320, Mexico

²

Programa Investigadoras e Investigadores por Mexico del CONACYT, Ciudad de Mexico 03940, Mexico

³

Facultad de Ingeniería, Universidad Autonoma del Estado de Mexico, Toluca de Lerdo 50000, Mexico

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Processes 2024, 12(1), 18; https://doi.org/10.3390/pr12010018

Submission received: 7 November 2023 / Revised: 12 December 2023 / Accepted: 18 December 2023 / Published: 20 December 2023

(This article belongs to the Special Issue Applications of Artificial Intelligence in Food Processing and Food Industries)

Download

Browse Figures

Versions Notes

Abstract

:

This paper presents a detailed analysis of the relation between physical characteristics and defects of green coffee beans and the sensory profile that influence the sensory notes of fragrance, aroma, flavor, and aftertaste of coffee. Machine learning models were used to identify the variables of importance and identify the ways in which these variables affect the sensory note of coffee, to determine which algorithm and its hyperparameters have greater precision in determining the sensory values of coffee such as floral, fruity, herbal, nutty, caramel, chocolate, spicy, resinous, pyrolytic, earthy, fermented, and phenolic. The result indicates the relationship and importance that exist between the physical variables, defects, and size of the green coffee bean, with respect to their respective sensory notes. The data of the proposed system demonstrate that by combining the scores of several experts, a precision can be achieved analogously to that obtained by cupping experts; therefore, the possibility of errors induced by human concerns such as fatigue or subjectivity is reduced.

Keywords:

sensory notes; green coffee; machine learning; physical properties

1. Introduction

The physical characteristics of green coffee are essential to determine its quality. It is considered that there are physical properties such as volume, density, mass, and porosity in green coffee that are highly significant differences [1]. The discrimination of defective and non-defective coffee significantly impacts its quality. A poor quality product deprives the sale on the market, which is why the evaluation of the products before going to market requires a selection technique [2]; a methodology is presented that, by using specific physical and chemical attributes, allows for differentiating the types of beans tested for defects and non-defects. The types of Arabica and Robusta coffee have physical properties such as size, density, mass, and resistance. A variant of the applied technique was shown in the resistance to the crushing of berries between one coffee and another. Another characteristic to highlight was humidity, which interfered with the results for both coffee variants [3]. The physical characteristics of coffee differ according to its type; the color separation technique according to its roasting was only effective for robusta coffee, determining a better quality [4].

Methods implemented in regions due to their unique geographical conditions and climates are necessary to control coffee quality. A study showed that the physical quality of Gayo green coffee beans using the Indonesian semi-washed processing method compared to the wholly washed processing method (traditional method) gives preliminary results for evaluations of the processing method, its degree of roasting, and the quality of the assayed coffee [5]. Based on the results in which the roasting process has a direct impact on the physical properties of Sidikalang robusta coffee and the different colors of cherry coffee, it was determined that the results change from the initial condition of the beans, including changes in water content, mass, porosity value, and bulk density [6].

Among the relevant components to determine food quality are sensory characteristics because these characteristics, such as flavor, defects, and texture, determine how consumers perceive products [7]. Regarding coffee, one of the most consumed beverages in the world, it had a global consumption of 166 million 60 kg bags in 2020 [8], of which its quality has been assessed by ’expert cuppers’ who determine its aspect, texture, flavor, aroma, etc. [9], but situations like pandemics highlight the need to count with different systems to determine whether these sensory characteristics score the legislated standards.

Therefore, research has been performed to establish different methodologies to determine different characteristics by implementing Artificial Intelligence models; some are related to differentiating organic coffee from non-organic coffee by determining trace elements present with accurate results (98.2%), using the Naïve Bayes algorithm for eight minerals [10], washed locally or exported from unwashed locally or exported-in raw coffee beans, achieving an accuracy of 89.1% [11], and Luwak coffee green beans from non-Luwak, concluding with a 97% validation accuracy [12].

Other authors have used color to determine defects [13,14], roasting degree [15], and moisture [16] using different Artificial Intelligence models, achieving accurate results like the studies. Finally, the most evaluated characteristics are related to sensory attributes in roasted coffee [17,18], roasted coffee-flavored sterilized drink [19], civet coffee [20], Nespresso [21], and specialty coffee [22] by implementing different methodologies such as Partial Least Squares, Bayesian regularization, Fuzzy expert system, decision tree, and convolutional neural networks.

Additional studies have included more parameters to describe variety and quality, i.e., Ref. [23] elaborated a system where mid-infrared transmittance spectroscopy was combined with a pattern recognition algorithm to determine four varieties of coffee from China, concluding that six classification models had 95% precision. Ref. [24] combined near infrared spectroscopy (NIRS) and a feed forward back propagation artificial neural network (FFBPANN) to determine coffee quality in Robusta beans and civet coffee finding results of 99%, with a 98% precision to distinguish civet from non-civet coffee beans. Finally, ref. [25] combined objective features with subjective ones to evaluate optimal features by implementing three different models, that is, Random Forest (RFC), Support Vector Regression (SVR), and multilayer perception (MLP), concluding that Random Forest was the best model for the system.

This research focuses on green coffee beans and how, from the properties of variety and type, their physical properties, and their defects in the beans, an estimation of the sensory properties of coffee can be made, which could only be obtained by performing the roasting and cupping process.

2. Materials and Methods

This research has two main study themes: The analysis and determination of the variables of importance of coffee, and the estimation of the sensory notes from the physical properties and defects of green coffee beans. Figure 1 shows a schematic diagram indicating how, through the use of Machine Learning algorithms (ML) and data such as the physical properties of green coffee beans, such as humidity, density, bean defects, size, and shape, it is possible to determine different properties. The results of the cuppings of these coffee samples were also used to perform an analysis of how these variables influence the sensory notes of coffee, as well as the creation of a model based on machine learning algorithms to estimate the sensory properties of coffee from their physical properties and deformations of coffee beans.

2.1. Coffee Samples

The following physical variables of coffee, such as type, variety, physical properties, and defects, and sensory information about coffee, such as fragrance, aroma, flavor, and aftertaste flavor, were collected from 185 samples from different varieties of specialty coffee and locations in Mexico. The varieties, types, and information about the sensory profile were obtained from certified experts from a specialized coffee organization (Centro Agroecológico del Café A.C. (CAFECOL)) and are described as follows:

Type: parchment, green, ball, honey;
Variety: Tekisik, marago, sudan rome, mundo novo, sarchimor, cr95, criollo, garnica, typica, bourbon, costa rica, catuai, arabica, colombia, arainema, catura, catimor, Marseillaise, geisha, pacamara, garnica, oro azteca, jilotepec, black honey, catuai red, catuai yellow;
Physical properties: natural, demucilanging, honey, wash, humid, dried, mix, sieve, planilla, guardiola dryer, humidity, density, black, bitter, dried cherry;
Defects: fungus, foreign matter, severe berry borer, black partial, bitter partial, parchment, float, unmature, wrinkled, shells, split, husk, light berry borer;
Size: Sieve 19, sieve 18, sieve 17, sieve 16, sieve 15, low sieve 15;
Shape Flat bean, peaberry, triangle, monster, shell, ball, performance, stain;
Fragrance, aroma, flavor, and aftertaste evaluation: Floral, fruity, herbal, nuts, caramel, chocolate, spicy, resinous, pyrolytic, earthy, fermented;
Evaluation Results: Fragrance, flavor, residual, acidity, body, balance, barista score, and total score.

Figure 2 shows nine samples of green coffee grown in the high mountains of Veracruz Mexico. The varieties are (a) Bourbon, (b) Typica, (c) Colombia, (d) Caturra, (e) Costa Rica, (f) Mundo Novo, (g) Garnica, (h) Cautai, and (i) Marseillaise. Each of these varieties has different physical properties, defects, and sizes.

2.2. Coffee Cupping Methodology

Cupping is a standardized process to evaluate the aroma, flavor, and texture of a coffee sample. To perform a complete evaluation of coffee, tasters focus on three relevant aspects: smell (aroma, fragrance, and residual flavor), taste (flavor, acidity, and sweetness), and tactile sensation or texture (body). Figure 3 shows the process of preparing a cup of coffee.

The tasting consists of the following steps based on the Mexican standard NMX-F-177-SCFI-2009 [26].

Coffee cupping basically involves the following steps:

An amount of 8.25 g of ground roasted coffee (at most 5 min before) is placed in the cup for each 150 mL of water; immediately aspirate the loose gases (dry aroma or fragrance).
Immediately aspirate the loose gases (dry scent or fragrance)
Hot water at a 92 °C temperature is placed inside each cup; immediately inhale the vapors (wet aroma).
Let the infusion stand for 3 to 5 min to allow for proper extraction and dilution.
A layer or crust forms on the surface of the cup that allows for measurement of the aromatic character.
The layer or crust that forms is broken with a round spoon by deeply inhaling the vapors coming from the cup.
All foam and particles are cleaned and removed from the surface. After eight minutes, a spoonful of the beverage is placed near the mouth and aspirated.
Aspiration introduces steam into the nasal cavity and evenly spreads the liquid over the entire tongue.
The beverage should be held in the mouth for three to five seconds to perceive the intensity and quality of the taste characteristics: flavor, acidity, sweetness, cleanliness, and balance.
The beverage is then expelled after this time, into a container intended for this purpose, evaluating the sensation that remains in the mouth after tasting to determine the residual taste. The tongue is gently slid across the palate to determine texture, fat content, and intensity.
The taste evaluation should be carried out in three stages: hot, warm, and cold, to assess the consistency and uniformity of the beverage.

3. Machine Learning Methodology

The methodology implemented to obtain the sensory parameters of coffee is presented. The model training section focuses on generating a model that is trained via a supervised learning methodology using the variables of type, variety, physical properties, and defects to estimate each sensory note of aroma, fragrance, flavor, and aftertaste as shown in Figure 4.

The objective of the research is to obtain information about which input variables in the database affect the selected target variable. Therefore, the models based on artificial intelligence methodologies that can achieve the estimation of sensory parameters and coffee quality were designed. The research was divided into database creation, data preprocessing, variable selection, cross-validation, feature importance, machine learning models, and hyperparameter tuning to achieve this.

Database creation: Information to be used includes data on physical and sensory characteristics obtained from the coffee evaluation. The Pandas library was used in Python to manage everything related to loading, analyzing, and storing information in the database.
Pre-processing of the database: The database was analyzed using an algorithm that finds data that could be corrupted, such as empty fields and special characters. When the algorithm finds data that present problems, an automatic elimination is performed to ensure that the machine learning models do not have problems during their execution due to these data types.
Variable Selection: The user selects the variables to be analyzed; that is, he determines the variables to be in the input and in the target. The program allows the user to write the names of the variables to be analyzed so that the system automatically stores, analyzes, and groups the information in the search. A fundamental segment to obtain results that are closest to real is achieved by guaranteeing that the data implemented to train and test is correct by reducing the possibility of overtraining the model. Therefore, the cross-validation technique is implemented so that the algorithm implements different tests via different combinations of nK folds established by the user. Figure 5 shows the scheme of the cross-validation segment.
Feature Importance: The objective was to obtain the relevant variables that positively or negatively impact the target variable to be analyzed. To achieve this, the following models were used: Recursive Feature Elimination (RFE), CHI Square, least absolute shrinkage and selection operator (LASSO), CatBoost (CBC), Decision Tree (DTC), Random Forest (RFC), k-nearest neighbors (KNN), Linear Regression (LR), and Logistic Regression (LogR). They all provide the most important variables that influence the selected target variable. To reduce the overtraining of the models and to guarantee that the data obtained are as accurate as possible, the cross-validation technique was used.
Machine Learning models: This segment uses the data from the database section and executes the Support Vector Classifier (SVC), K-Nearest Neighbors (KNN), Linear Regression, Decision Tree, Random Forest, CatBoost, Naïve Bayes (NB), and Logistic Regression models to predict coffee quality and sensory variables. Cross validation was performed and the hyperparameters were adjusted to obtain the best possible accuracy (Figure 6).

3.1. Analysis of Important Variables

An in-depth study was carried out to determine the variables of coffee that impact the variables of sensory notes related to fragrance, aroma, flavor, and aftertaste; the experiments carried out were as follows.

(1) Determination of the most important variables related to the sensory notes of green coffee via the variety and type of coffee beans. One hundred and eighty-five green coffee beans were used from the high mountain zone of Veracruz, Mexico, whose information contains the sensory analysis of coffee cupping with respect to the variety and type of coffee. Therefore, a database with these characteristics was constructed for analysis. The objective of this experiment is focused on obtaining data to analyze the existing relationship between cupping sensory variables and their close relationship with the variety and type of coffee analyzed, with which to identify the different sensory notes of aroma, fragrance, flavor, and aftertaste that are influenced by the type and variety of coffee.

(2) Determination of the most important variables related to the sensory notes of green coffee via physical properties (humidity, density, defects, size, and shape). One hundred and eighty-five green coffee beans were used from the highlands of Veracruz, whose information includes the sensory analysis of coffee cupping concerning the physical properties and defects of the coffee beans, which directly impact the sensory notes of aroma, fragrance, flavor, and aftertaste.

3.2. Sensory Notes’ Prediction

The prediction of sensory notes of coffee in aroma, fragrance, flavor, and aftertaste was performed using information on the variety, type, physical properties, and defects of green coffee beans. In this experiment, 185 green coffee samples were used. Seven machine learning models were used to predict sensory notes such as floral, fruity, herbal, nutty, caramel, chocolate, spicy, resinous, pyrolytic, earthy, and fermented sensory notes in aroma, fragrance, flavor, and aftertaste. During coffee cupping, the expert cupper assigns a rating comprising ranges from 0 to 5 to each of the sensory notes detected in the four stages of sensory analysis (aroma, fragrance, flavor, and aftertaste). This part of the research focuses on the development of an algorithm that can predict the sensory notes of coffee via Machine Learning models with supervised learning configured for classification using the variables of variety and type, physical properties, and defects of green coffee beans. This results in categorized values of sensory notes.

4. Results and Discussion

4.1. Results of Physical Analysis

The following are the results of the physical analysis of some of the varieties such as Bourbon, Typica, Colombia, Caturra, Costa Rica, and Geisha. The results indicate the average number of green coffee beans with some defect in the coffee bean.

It can be seen in Figure 7 that the split is one of the most common defects in all varieties. Coffee Typica was one of the samples that contained the most characteristic defects, which are unmature, wrinkled, split, berry borer, and bitter partial. Bourbon concentrates its defects in Black partial, fungus, unmature, and wrinkled.

Figure 8 shows the average results of the coffee varieties Bourbon, Typica, Colombia, Caturra, Costa Rica, and Geisha, indicating the number of green coffee beans according to their size contained in each variety. The size of the bean is an important factor in the roasting process, since the aim is to guarantee a homogeneous roasting which can be affected if there are many beans of different sizes.

In [27], a method is described to correlate the physical properties of Java Arabica green coffee beans with their physical defects by using boxplots to determine the distribution of physical defects under different post-harvest processing and drying methods. It is concluded that fewer defects are observed in the wet process compared to the dry process, and mechanical drying gives a better quality of green coffee beans and minimizes losses.

4.2. Results of Coffee Cupping

The following are the results obtained from the cupping: sensory notes such as fruity, herbal, nuts, caramel, chocolate, spicy, resinous, pyrolytic, earthy, fermented, and phenolics are analyzed.

The results of coffee cupping are shown in the Figure 9 and Figure 10, where it can observed via heat maps the level of presence of a given sensory note (blue—low presence, red—high presence). The results are divided into four sections: Aroma, fragrance, flavor, and aftertaste.

According to the results of the aroma tasting (Figure 9), the sensory notes of spicy, chocolate, caramel, nutty, fruity, and floral are the notes most present in Marseillaise, Caturra, Arabica, Colombia, Costa Rica, Bourbon, and Typica coffees. Herbal notes are very present, especially in the Typica, Bourbon, Costa Rica, and Caturra coffees.

Regarding the aroma results (Figure 9), the fruity, nutty, caramel, and chocolate notes were the most characteristic of the coffees studied. Resinous, pyrolytic, earthy, and fermented notes were found in the Typica, Bourbon, and Caturra coffees.

The taste results (Figure 10) showed that the fruity, nutty, caramel, chocolate, and spicy notes were the most characteristic of the coffees studied. Pyrolytic, resinous, and earthy notes were more present in the Typica, Bourbon, Colombia, and Caturra coffees.

The aftertaste results (Figure 10) show that all coffees have fruity, herbal, nutty, caramel, chocolate, and spicy notes. Earthy and fermented notes are present in Typica, Bourbon, Colombia, and Caturra.

In [27], a spider graph is used to determine the influence of the drying process on the average obtained from the evaluation of five judges on the sensory notes of the aroma of Java Arabica green coffee beans. They found that mechanically dried samples are more nutty, earthy, and grassy than sundried samples. Instead of using spider graphs, heat maps help us quickly and consistently determine the presence of sensory notes in seven coffee varieties in the four cases presented above. This task would be unfeasible using spider graphs.

4.3. Results of Feature Importance of Sensorial Analysis

Figure 11 shows the number of concordances of the variables of importance obtained by the methods used (RFE, LASSO, KNN, CHI2, CATBOOST, DECISION TREE, and RANDOM FOREST). The result indicates the relationship and importance that exists between coffee varieties and their respective sensory notes, such as floral, fruity, herbal, nutty, caramel, chocolate, spicy, resinous, pyrolytic, earthy, fermented, and phenolic, which were analyzed in the aroma, fragrance, flavor, and aftertaste. The varieties and types that were most present in all sensory notes in aroma, fragrance, flavor, and aftertaste tastings were the Bourbon, Costa Rica, Colombia, Typica, and Maragón varieties.

Specifically in the aroma tasting, Costa Rica, Caturra, and Maragon were the varieties with the most significant impact on all aroma sensory notes. In the fragrance, the varieties Bourbon, Costa Rica, and Typica had the most significant influence on the sensory notes of the fragrance. In taste, the Colombia, Caturra, and Bourbon varieties were the most relevant. Finally, in the aftertaste, the Costa Rica, Typica, Criollo, and Geisha varieties obtained the highest number of concordances.

Figure 12 shows the number of concordances of the variables of importance of the physical variables, defects, and size obtained by the methods used (RFE, LASSO, KNN, CHI2, CATBOOST, DTC, and RFC). The result indicates the relationship and importance that exist between the physical variables, defects, and size of the green coffee bean with respect to the sensory notes such as floral, fruity, herbal, nutty, caramel, chocolate, spicy, resinous, pyrolytic, earthy, fermented, and phenolic, which were analyzed in the aroma, fragrance, flavor, and aftertaste.

The physical variables, defects, and size that were the most present in all sensory notes in the aroma, fragrance, flavor, and aftertaste tastings were moisture, density, low Z15, flat appearance, peaberry, and bitter. Specifically in the aroma tasting, the physical characteristics were honey, moisture, density, bitter, fungus, shells, Z15, flat bean, and peaberry. For the fragrance tasting, it was shells, moisture, density, bitter, unripe, z19, loz z15, flat bean, and peaberry and ball. For flavor, it was humidity, density, bitter, float, shells, z19, low z15, flat bean, peaberry, and monster. For the taste, it was humidity, density, bitter, float, shells, z19, low z15, flat bean, peaberry, and monster. For the aftertaste, it was moisture, density, bitter, husk, z18, z17, low z15, flat bean, and peaberry.

In the literature, it is possible to find several research studies in which the physical and sensory properties of coffee beans have been correlated using Pearson correlations to find if there is statistical significance between the correlations between pairs of physical and sensory properties [4,28]. The conclusions of this research could indicate that there is a coincidence between the variables between physical properties and sensory notes. Therefore, in this research, seven methods were used to select properties; each method finds which physical characteristics influence each sensory note the most. Not all methods determine the same properties, so by adding up the number of times each physical property was important by all methods, we arrived at a more robust consensus than using Pearson correlations. Moreover, thanks to heat maps, determining the most important physical properties of each sensory note is easier to visualize.

4.4. Results of Sensorial Notes’ Prediction

One of the significant challenges in machine learning is the algorithm selection for the desired application; in most cases, the algorithm selection is made according to the programmer’s experience, which limits the efficiency of machine learning.

Automated machine learning improves algorithm selection and reduces the programmer’s tendency to use a particular algorithm or technique.

The evaluation of sensory characteristics of coffee is subjective, even though certified tasters do it; as an example of subjective evaluation, we can look at the individual evaluation of a sample or a taster. Averaging the scores of several tasters avoids the subjectivity of coffee tasting, although it is difficult to obtain the scores of several tasters on the same sample on a commercial scale.

The results show that combining different machine learning models makes it possible to obtain sensory coffee parameters with a precision comparable to that obtained by combining the scores of several experts. This methodology reduces the subjectivity of the tasters.

The results of the seven machine learning models used to predict floral, fruity, herbal, nutty, caramel, chocolate, spicy, resinous, pyrolytic, earthy, fermented in aroma, fragrance, flavor, and aftertaste sensory notes are shown below.

Figure 13, Figure 14, Figure 15 and Figure 16 show the sensory notes’ results based on the estimation accuracy of the aroma, fragrance, flavor, and aftertaste results, respectively. The results show that the RFC, CB, and SVC have obtained the best results, where the resinous, pyrolytic, earthy, and fermented were the sensory notes with better accuracy.

Table 1 and Table 2 show the results of the evaluation metrics used to predict each of the sensory notes. The results shown belong to the Machine Learning models that obtained the best results in their prediction. Accuracy was used to measure the number of correct positive predictions. Recall was used to measure the sensitivity, which is obtained via the number of positive cases that the classifier correctly predicted among all positive cases in the data. The F1 score was also used because it combines precision and recall by calculating a harmonic mean of the two. The macro average is calculated using the arithmetic mean of all F1 scores per class. And finally, the weighted average calculates the mean of the class scores by considering the actual number of occurrences of the class in the data.

5. Conclusions

One of the major contributions of this work is based on the results of the experiments carried out, which indicate that it is possible to know how the physical variables and deformations of green coffee beans impact each of their sensory notes. Therefore, it was possible to obtain a detailed map with this information.

The development of an automated machine learning model allows for the determination of sensory notes, making it possible to compare them to those obtained by experts, and, at the same time, opens the possibility of developing more complex automated algorithms. Using the hyperparameter tuning stage, the accuracy of the prediction of sensory properties is improved, which is close to that obtained via the combined scores of trained tasters.

Additionally, by obtaining a sensorial profile estimation of green coffee beans with the proposed Machine Learning model, producers and traders will be able to analyze data before the coffee roasting process, leading to a fair-trade model. By understanding the relationship between physical properties and the sensory profile, producers can improve the quality of the product.

The results of the research demonstrate that by combining different machine learning models, a precision can be achieved analogous to that of cupping experts; therefore, the possibility of errors induced by human concerns like tiredness or subjectivity is reduced.

Author Contributions

Methodology, B.G.-S., O.L.-E. and O.P.-R.; Validation, J.d.J.A.F.-C. and O.L.-E.; Formal analysis, B.G.-S., O.L.-E. and G.A.-R.; Investigation, B.G.-S., O.S.-G., J.d.J.A.F.-C., O.L.-E., O.P.-R. and G.A.-R.; Resources, J.d.J.A.F.-C.; Data curation, O.P.-R.; Writing—original draft, J.d.J.A.F.-C.; Writing—review & editing, O.S.-G.; Supervision, O.S.-G.; Project administration, O.S.-G.; Funding acquisition, O.S.-G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Consejo Veracruzano de Investigación Científica y Desarrollo Tecnológico (COVEICYDET) [grant number 15 2243/2021].

Data Availability Statement

Data are contained within the article.

Acknowledgments

The facilities provided by the Orizaba campus of Tecnológico Nacional de México and Centro Agroecológico del Café A.C. (CAFECOL) were highly appreciated, as were the sources of funding for the project.

Conflicts of Interest

The authors declared no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

NIRS	Near Infrared Spectroscopy
FFBPANN	Feed-Forward Back Propagation Artificial Neural Network
CAFECOL	Centro Agroecológico del Café
NMX	Norma Mexicana
RFE	Recursive Feature Elimination
LASSO	Least Absolute Shrinkage and Selection Operator
KNN	K Nearest Neighbors
SVC	Support Vector Classifier
ML	Machine Learning
RFC	Random Forest Classifier
SVR	Support Vector Regression
MLP	Mulilayer perception
ML	Machine Learning
RFE	Recursive Feature Elimination
LASSO	Least absolute shrinkage and selection operator
CBC	Catboost
DTC	Decision Tree
LogR	Logistic Regression
KNN	K-Nearest neighbors
NB	Naives Bayes

References

Belay, A.; Bekele, Y.; Abraha, A.; Comen, D.; Kim, H.; Hwang, Y. Discrimination of defective (Full Black, Full Sour and Immature) and nondefective coffee beans by their physical properties. J. Food Process. Eng. 2014, 37, 524–532. [Google Scholar] [CrossRef]
Chandrasekar, V.; Viswanathan, R. Physical and thermal properties of coffee. J. Agric. Eng. Res. 1999, 73, 227–234. [Google Scholar] [CrossRef]
Sinaga, S.; Julianti, E. Physical characteristics of Gayo arabica coffee with semi-washed processing. IOP Conf. Ser. Earth Environ. Sci. 2021, 782, 032093. [Google Scholar] [CrossRef]
Wondimkun, Y.; Emire, S.; Esho, T. Investigation of Physical and Sensory Properties of Ethiopian Specialty Dry Processed Green Coffee Beans. Acta Univ. Cibiniensis. Ser. E Food Technol. 2020, 24, 39–48. [Google Scholar] [CrossRef]
Mendonça, J.; Franca, A.; Oliveira, L. Physical characterization of non-defective and defective Arabica and Robusta coffees before and after roasting. J. Food Eng. 2009, 92, 474–479. [Google Scholar] [CrossRef]
Yusibani, E.; Putra, R.I.; Rahwanto, A.; Surbakti, M.S. Physical properties of Sidikalang robusta coffee beans medium roasted from various colors of coffee cherries. J. Phys. Conf. Ser. 2022, 2243, 012046. [Google Scholar] [CrossRef]
Tanya, P.; Titova, C. Food Quality Evaluation According to Their Color Characteristics. Facta Univ. 2015, 14, 1–10. [Google Scholar]
ICO International Coffee Organisation Coffee Market Report. 2021. Available online: https://www.ico.org/documents/cy2020-21/cmr-0421-e.pdf (accessed on 17 December 2023).
Feria-Morales, A. Examining the case of green coffee to illustrate the limitations of grading systems/expert tasters in sensory evaluation for quality control. Food Qual. Prefer. 2002, 13, 355–367. [Google Scholar] [CrossRef]
Barbosa, R.; Batista, B.; Varrique, R.; Coelho, V.; Campiglia, A.; Barbosa, F. The use of advanced chemometric techniques and trace element levels for controlling the authenticity of organic coffee. Food Res. Int. 2014, 61, 246–251. [Google Scholar] [CrossRef]
Wallelign, S.; Polceanu, M.; Jemal, T.; Buche, C. Coffee grading with convolutional neural networks using small datasets with high variance. J. WSCG 2019, 27, 113–120. [Google Scholar] [CrossRef]
Hendrawan, Y.; Widyaningtyas, S.; Sucipto, S. Computer vision for purity, phenol, and pH detection of Luwak coffee green bean. Telkomnika Telecommun. Comput. Electron. Control. 2019, 17, 3073–3085. [Google Scholar] [CrossRef]
Chou, Y.; Kuo, C.; Chen, T.; Horng, G.; Pai, M.; Wu, M.; Lin, Y.; Hung, M.; Su, W.; Chen, Y.; et al. Deep-learning-based defective bean inspection with GAN-structured automated labeled data augmentation in coffee industry. Appl. Sci. 2019, 9, 4166. [Google Scholar] [CrossRef]
Oliveira, E.; Leme, D.; Barbosa, B.; Rodarte, M.; Pereira, R. A computer vision system for coffee beans classification based on computational intelligence techniques. J. Food Eng. 2016, 171, 22–27. [Google Scholar] [CrossRef]
Leme, D.; Da Silva, S.; Barbosa, B.; Borem, F.; Pereira, R. Recognition of coffee roasting degree using a computer vision system. Comput. Electron. Agric. 2019, 156, 312–317. [Google Scholar] [CrossRef]
Virgen-Navarro, L.; Herrera-López, E.; Corona-González, R.; Arriola-Guevara, E.; Guatemala-Morales, G. Neuro-fuzzy model based on digital images for the monitoring of coffee bean color during roasting in a spouted bed. Expert Syst. Appl. 2016, 54, 162–169. [Google Scholar] [CrossRef]
Livio, J.; Hodhod, R. AI Cupper: A Fuzzy Expert System for Sensorial Evaluation of Coffee Bean Attributes to Derive Quality Scoring. IEEE Trans. Fuzzy Syst. 2018, 26, 3418–3427. [Google Scholar] [CrossRef]
Ribeiro, J.; Augusto, F.; Salva, T.; Ferreira, M. Prediction models for Arabica coffee beverage quality based on aroma analyses and chemometrics. Talanta 2012, 2, 253–260. [Google Scholar] [CrossRef]
Goyal, S.; Goyal, G. Machine Learning ANN Models for Predicting Sensory Quality of Roasted Coffee Flavoured Sterilized Drink. ADCAIJ Adv. Distrib. Comput. Artif. Intell. J. 2013, 2, 9–13. [Google Scholar] [CrossRef]
Wakhid, S.; Sarno, R.; Sabilla, S.; Maghfira, D. Detection and classification of indonesian civet and non-civet coffee based on statistical analysis comparison using E-Nose. Int. J. Intell. Eng. Syst. 2020, 13, 56–65. [Google Scholar] [CrossRef]
Gonzalez, V.C.; Tongson, E.; Fuentes, S. Integrating a Low-Cost Electronic Nose and Machine Learning Modelling to Assess Coffee Aroma Profile and Intensity. Sensors 2021, 21, 2016. [Google Scholar] [CrossRef]
Chang, Y.; Hsueh, M.; Hung, S.; Lu, J.; Peng, J.; Chen, S. Prediction of Specialty Coffee Flavors Based on Near-Infrared Spectra Using Machine and Deep-Learning Methods; John Wiley: Hoboken, NJ, USA, 2021; p. 8. [Google Scholar]
Zhang, C.; Wang, C.; Liu, F.; He, Y. Mid-infrared spectroscopy for coffee variety identification: Comparison of pattern recognition methods. J. Spectrosc. 2016, 2016, 7927286. [Google Scholar] [CrossRef]
Arboleda, E. Discrimination of civet coffee using near-infrared spectroscopy and artificial neural network. Int. J. Adv. Comput. Res. 2018, 8, 324–334. [Google Scholar] [CrossRef]
Berardinis, J.; Pizzuto, G.; Lanza, F.; Chella, A.; Meira, J.; Cangelosi, A. At Your Service: Coffee Beans Recommendation from a Robot Assistant. In Proceedings of the HAI 2020–Proceedings of the 8th International Conference On Human-Agent Interaction, Virtual, 10 November 2020; pp. 257–259. [Google Scholar]
Norma Mexicana: NMX-F-177-SCFI-2009. Specialty Green Coffee—Specifications, Preparations and Sensory Evaluation. Available online: http://www.economia-nmx.gob.mx/normas/nmx/2009/nmx-f-177-scfi-2009.pdf (accessed on 17 December 2023).
Sunarharum, W.B.; Yuwono, S.S.; Pangestu, S.W.; Nadhiroh, H. Physical and sensory quality of Java Arabica green coffee beans. IOP Conf. Ser. Earth Environ. Sci. 2018, 131, 012018. [Google Scholar] [CrossRef]
Cheserek, J.J.; Ngugi, K.; Muthomi, J.W.; Omondi, C.O. Assessment of Arabusta coffee hybrids Coffea arabica L. X Tetraploid Robusta (Coffea canephora ) for green bean physical properties and cup quality. Afr. J. Food Sci. 2020, 14, 119–127. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of the methodology implemented.

Figure 2. Samples of different varieties of coffee green beans. Varieties: (a) Bourbon, (b) Typica, (c) Colombia, (d) Caturra, (e) Costa Rica, (f) mundo novo, (g) Garnica, (h) Catuai, and (i) Marseillaise.

Figure 3. (a) Analysis of defects in green coffee beans, (b) preparation of the roasted coffee bean sample for cupping, and (c) coffee cup preparation and cupping process.

Figure 4. Schematic diagram of the Machine Learning methodology implemented and used for the analysis of important variables and the estimation of sensory properties.

Figure 5. Schematic diagram representing the machine learning methodology using cross validation for the determination of important variables.

Figure 6. Diagram representing the machine learning methodology for the generation of prediction models of coffee sensory notes.

Figure 7. Results indicating the number of green coffee beans according to their type of defect.

Figure 8. Results indicating the number of green coffee beans according to their bean size.

Figure 9. Results of the sensory notes of fragrance and aroma of coffee cupping.

Figure 10. Results of the sensory notes of the taste and aftertaste of coffee cupping.

Figure 11. Importance of Coffee Variety Properties in Sensorial Notes.

Figure 12. Importance of Physical Properties in Sensorial Notes.

Figure 13. Accuracy results of the ML models used in the prediction of aroma sensory notes.

Figure 14. Accuracy results of the ML models used in the prediction of fragrance sensory notes.

Figure 15. Accuracy results of the ML Algorithm used in the prediction of flavor sensory notes.

Figure 16. Accuracy results of the ML Algorithm used in the prediction of aftertaste sensory notes.

Table 1. Evaluation metrics used in the determination of sensory notes of aroma and fragrance sensory notes.

Sensory Note	ML Model		Precision	Recall	F1 Score
Aroma—Floral	SVC	Accuracy			0.9
		Macro avg	0.78	0.78	0.78
		Weighted avg	0.9	0.9	0.9
Aroma—Fruity	SVC	Accuracy			0.81
		Macro avg	0.87	0.86	0.86
		Weighted avg	0.82	0.81	0.81
Aroma—Herbal	SVC	Accuracy			0.93
		Macro avg	0.58	0.39	0.41
		Weighted avg	0.92	0.92	0.92
Aroma—Nuts	RFC	Accuracy			0.86
		Macro avg	0.81	0.70	0.76
		Weighted avg	0.86	86	0.86
Aroma—Caramel	CB	Accuracy			0.83
		Macro avg	0.85	0.81	0.83
		Weighted avg	0.85	0.82	0.83
Aroma—Chocolates	CB	Accuracy			0.81
		Macro avg	0.87	0.81	0.82
		Weighted avg	0.82	0.81	0.81
Aroma—Spicy	RFC	Accuracy			0.92
		Macro avg	0.61	0.59	0.6
		Weighted avg	0.9	0.92	0.91
Fragrance—Floral	SVC	Accuracy			0.85
		Macro avg	0.87	0.88	0.88
		Weighted avg	0.85	0.86	0.85
Fragrance—Fruity	RFC	Accuracy			0.82
		Macro avg	0.75	0.72	0.73
		Weighted avg	0.84	0.81	0.82
Fragrance—Herbal	SVC	Accuracy			0.96
		Macro avg	0.81	0.74	0.77
		Weighted avg	0.94	0.95	0.94
Fragrance—Nuts	SVC	Accuracy			0.83
		Macro avg	0.86	0.78	0.81
		Weighted avg	0.83	0.82	0.82
Fragrance—Caramel	RFC	Accuracy			0.83
		Macro avg	0.86	0.86	0.86
		Weighted avg	0.84	0.84	0.84
Fragrance—Chocolates	CB	Accuracy			0.82
		Macro avg	0.86	0.76	0.81
		Weighted avg	0.81	0.77	0.78
Fragrance—Spicy	CB	Accuracy			0.88
		Macro avg	0.89	0.82	0.85
		Weighted avg	0.89	0.88	0.88

Table 2. Evaluation metrics used in the determination of sensory notes of flavor and aftertaste sensory notes.

Sensory Note	ML Model		Precision	Recall	F1 Score
Flavor—Floral	SVC	Accuracy			0.8
		Macro avg	0.86	0.93	0.88
		Weighted avg	0.89	0.88	0.88
Flavor—Fruity	SVC	Accuracy			0.82
		Macro avg	0.73	0.64	0.68
		Weighted avg	0.81	0.8	0.8
Flavor—Herbal	SVC	Accuracy			0.87
		Macro avg	0.62	0.52	0.55
		Weighted avg	0.86	0.88	0.86
Flavor—Nuts	RFC	Accuracy			0.85
		Macro avg	0.69	0.61	0.64
		Weighted avg	0.84	0.83	0.84
Flavor—Caramel	RFC	Accuracy			0.85
		Macro avg	0.9	0.87	0.88
		Weighted avg	0.85	0.84	0.85
Flavor—Chocolates	RFC	Accuracy			0.83
		Macro avg	0.82	0.71	0.75
		Weighted avg	0.83	0.82	0.82
Flavor—Spicy	RFC	Accuracy			0.88
		Macro avg	0.91	0.91	0.91
		Weighted avg	0.88	0.87	0.88
Aftertaste-Floral	SVC	Accuracy			0.88
		Macro avg	0.87	0.91	0.88
		Weighted avg	0.9	0.89	0.88
Aftertaste—Fruity	SVC	Accuracy			0.82
		Macro avg	0.85	0.81	0.83
		Weighted avg	0.87	0.82	0.84
Aftertaste—Herbal	SVC	Accuracy			0.87
		Macro avg	0.87	0.66	0.72
		Weighted avg	0.87	0.86	0.87
Aftertaste—Nuts	SVC	Accuracy			0.85
		Macro avg	0.86	0.89	0.87
		Weighted avg	0.85	0.85	0.85
Aftertaste—Caramel	RFC	Accuracy			0.85
		Macro avg	0.84	0.77	0.8
		Weighted avg	0.84	0.84	0.84
Aftertaste—Chocolates	RFC	Accuracy			0.83
		Macro avg	0.68	0.7	0.69
		Weighted avg	0.81	0.81	0.82
Aftertaste—Spicy	CB	Accuracy	0.88	0.74	0.88
		Macro avg	0.89	0.88	0.78
		Weighted avg			0.88

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gonzalez-Sanchez, B.; Sandoval-Gonzalez, O.; Flores-Cuautle, J.d.J.A.; Landeta-Escamilla, O.; Portillo-Rodriguez, O.; Aguila-Rodriguez, G. A Study of the Physical Characteristics and Defects of Green Coffee Beans That Influence the Sensory Notes Using Machine Learning Models. Processes 2024, 12, 18. https://doi.org/10.3390/pr12010018

AMA Style

Gonzalez-Sanchez B, Sandoval-Gonzalez O, Flores-Cuautle JdJA, Landeta-Escamilla O, Portillo-Rodriguez O, Aguila-Rodriguez G. A Study of the Physical Characteristics and Defects of Green Coffee Beans That Influence the Sensory Notes Using Machine Learning Models. Processes. 2024; 12(1):18. https://doi.org/10.3390/pr12010018

Chicago/Turabian Style

Gonzalez-Sanchez, Blanca, Oscar Sandoval-Gonzalez, Jose de Jesus Agustin Flores-Cuautle, Ofelia Landeta-Escamilla, Otniel Portillo-Rodriguez, and Gerardo Aguila-Rodriguez. 2024. "A Study of the Physical Characteristics and Defects of Green Coffee Beans That Influence the Sensory Notes Using Machine Learning Models" Processes 12, no. 1: 18. https://doi.org/10.3390/pr12010018

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Study of the Physical Characteristics and Defects of Green Coffee Beans That Influence the Sensory Notes Using Machine Learning Models

Abstract

1. Introduction

2. Materials and Methods

2.1. Coffee Samples

2.2. Coffee Cupping Methodology

3. Machine Learning Methodology

3.1. Analysis of Important Variables

3.2. Sensory Notes’ Prediction

4. Results and Discussion

4.1. Results of Physical Analysis

4.2. Results of Coffee Cupping

4.3. Results of Feature Importance of Sensorial Analysis

4.4. Results of Sensorial Notes’ Prediction

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI