Application of Machine Learning to Assess the Quality of Food Products—Case Study: Coffee Bean

Przybył, Krzysztof; Gawrysiak-Witulska, Marzena; Bielska, Paulina; Rusinek, Robert; Gancarz, Marek; Dobrzański, Bohdan; Siger, Aleksander

doi:10.3390/app131910786

Open AccessArticle

Application of Machine Learning to Assess the Quality of Food Products—Case Study: Coffee Bean

by

Krzysztof Przybył

^1,*

,

Marzena Gawrysiak-Witulska

¹,

Paulina Bielska

¹

,

Robert Rusinek

²

,

Marek Gancarz

^2,3

,

Bohdan Dobrzański, Jr.

⁴

and

Aleksander Siger

⁵

¹

Department of Dairy and Process Engineering, Faculty of Food Science and Nutrition, Poznań University of Life Sciences, ul. Wojska Polskiego 31/33, 60-624 Poznan, Poland

²

Institute of Agrophysics, Polish Academy of Sciences, Doświadczalna 4, 20-290 Lublin, Poland

³

Faculty of Production and Power Engineering, University of Agriculture in Krakow, Balicka 116B, 30-149 Krakow, Poland

⁴

Pomology, Nursery and Enology Department, University of Life Sciences in Lublin, Głęboka 28, 20-400 Lublin, Poland

⁵

Department of Food Biochemistry and Analysis, Poznań University of Life Sciences, ul. Mazowiecka 48, 60-623 Poznan, Poland

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(19), 10786; https://doi.org/10.3390/app131910786

Submission received: 25 August 2023 / Revised: 22 September 2023 / Accepted: 25 September 2023 / Published: 28 September 2023

(This article belongs to the Special Issue Approaches to Machine and Deep Learning, Big Data or Modern Analytical Methods in the Agri-Food Industry)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Modern machine learning methods were used to automate and improve the determination of an effective quality index for coffee beans. Machine learning algorithms can effectively recognize various anomalies, among others factors, occurring in a food product. The procedure for preparing the machine learning algorithm depends on the correct preparation and preprocessing of the learning set. The set contained coded information (i.e., selected quality coefficients) based on digital photos (input data) and a specific class of coffee bean (output data). Because of training and data tuning, an adequate convolutional neural network (CNN) was obtained, which was characterized by a high recognition rate of these coffee beans at the level of 0.81 for the test set. Statistical analysis was performed on the color data in the RGB color space model, which made it possible to accurately distinguish three distinct categories of coffee beans. However, using the Lab* color model, it became apparent that distinguishing between the quality categories of under-roasted and properly roasted coffee beans was a major challenge. Nevertheless, the Lab* model successfully distinguished the category of over-roasted coffee beans.

Keywords:

artificial intelligence; deep learning; convolutional neural networks (CNNs); coffee bean

1. Introduction

Arabica coffee is a species of coffee tree with small beans and belongs to the Rubiaceae family. This unique species characterized by the features of an evergreen tree [1], which, through proper breeding in the zone of tropical areas, allows it to maintain its green leafy state almost throughout the year. Coffee beans are used in the preparation of coffee beverage, which is the second most consumed beverage in the world [2]. However, among the 100 species of coffee, it is Arabica coffee that has the greatest economic importance [2] due to its flavor, aroma, low caffeine content, cultivation requirements and product price, among other reasons.

Arabica coffee is one of many group of food products that exhibit beneficial effects on the human body [3,4,5]. This is related to its content of key bioactive compounds, such as caffeine, lipids, polyphenols, vitamins, minerals and aroma compounds [3,4,6]. In the literature, the consumption of Arabica coffee has been shown to help neutralize free radicals in the body, reducing the risk of chronic diseases including cardiovascular disease [7,8], Alzheimer’s and Parkinson’s disease, and liver disease [9,10,11,12].

The coffee bean also represents a valuable dietary source due to, among other things, the chlorogenic acids (CGAs) they contain [13]. This is a group of polyphenols found naturally in coffee beans. They are also a powerful antioxidant, which, as mentioned, in addition to inhibiting certain diseases, can promote weight loss by, among other things, affecting metabolism and glucose absorption [14,15]. It seems crucial to assess the quality of coffee beans due to the impact of sourcing the right beans during the coffee roasting process, where the quantity and quality of the CGA they contain can degrade rapidly. Therefore, the process of sourcing and selecting the right beans is important in analyzing the health properties of coffee. The interest in health properties has attracted the attention of scientists as well as healthy eaters [3,4,11,12,14,16,17,18].

Artificial intelligence methods have gained great importance in the food production and distribution process through process optimization, crop monitoring and management, quality assessment [19,20] and food prediction, among others [21].

One of the artificial intelligence methods includes deep learning, which is a technique that uses artificial neural network models containing, among other things, two or more layers of hidden neurons. Deep learning models allow for data analysis without the researcher knowing the exact rules. This network topology processes information and is a tool that offers the mapping of the complex relationships occurring between the analyzed data [20,22]. Actions such as extracting the shape, color or texture of an analyzed object in a bitmap image requires a large amount of data, which translates into longer processing time and statistical analysis of that object. The application of deep learning methods makes it possible to automate some of the activities, among others, related to image data processing at various stages of statistical analysis [22,23,24].

The convolutional networks, also called convolutional neural networks (CNNs), are a case of deep learning that analyze individual areas of the input dataset and generate new tensors based on them [25]. By means of convolution operations, CNNs use a kernel function and involve increasing the ordinality of copies of the tensor with which operations are performed without increasing their dimensionality [26,27,28]. Convolutional networks use an algorithm to classify images due to the presence of repeated, recognizable and similar shapes in the image. The input tensor shape of the network accounts for the sample (pattern), image height, image width, and depth as the number of dimensions of the RGB color space [29,30,31].

In the literature, from the point of view of the problem posed, many researchers have tried to apply CNNs using digital images to assess the quality of green coffee beans during harvesting, for example, by the team of Huang et al. (2020) [18], among others, who achieved a coffee grading rate of 93%. On the other hand, the team of Pinto et al. (2020) [32], based on green coffee beans, similarly made a quality assessment according to defects, obtaining a grading rate of min. 72.4%. Research was also conducted by Wang et al. (2021) [16] to effectively build an intelligent system based on deep learning and computer vision to assist in the detection of defects, including mold, fermentation, insect bites and crushed coffee beans, obtaining neural models with recognition quality rates of 78% and 93% [16].

In view of a number of research works aimed at improving the evaluation of the quality of harvested coffee beans, the current technological solutions in the process of coffee harvesting and roasting are still being sought to be made more effective.

In this work, as mentioned above, the focus was on recognizing three quality classes of coffee beans because of the coffee roasting process. The assumption of the study was to assess the authenticity based on the observation of the color properties of selected Arabica coffee samples obtained through the processing of digital images. The utilitarian purpose was to develop convolutional networks to assist in the identification of quality classes of Arabica coffee beans. The application of modern methods using the Python 3 environment seems to be a key aspect. This is due to the fact that it is easier to implement this type of solution in the food industry than traditional statistical methods. The designed convolutional models, according to the machine learning criterion, have high scalability, which makes it possible to adapt to applications in the food industry.

2. Materials and Methods

2.1. Samples

The research material was Arabica coffee beans, which came from the San Pedro Necta area located in the Huehuetenango region of western Guatemala [33]. Specific information about the research material was included in the materials of the research paper by Rusinek et al. (2022) [4,33]. In order to conduct the experiment, 1 kg of each type of coffee was obtained, i.e., heavily roasted coffee (overdeveloped class), weakly roasted coffee (underdeveloped class) and properly roasted coffee (standard class), which was obtained by roasting coffee beans at the Rovigo Coffee roaster (Lublin, Poland). The beans were roasted in a Coffed SR 5 roaster procedure according to Rusinek et al. (2022) [4].

The coffee beans of each test sample (class) were subjected to an image acquisition process. Series of digital images were taken for the tested classes, i.e., coffee beans. Examples of test samples (research classes) describing each type of coffee bean are shown in Figure 1, where Figure 1a shows the equivalent of a properly roasted coffee bean (standard), Figure 1b defines an over-roasted coffee bean, and Figure 1c shows the pattern of an under-roasted coffee bean.

2.2. Color Analysis

In this research, a color analysis was performed by determining the L, a, b component parameters of the L*a*b* color space model [34,35,36] using an X-Rite SP-60 spectrophotometer (X-Rite, Grand Rapids, MI, USA). In order to perform the experiment, coffee beans were placed on plastic Petri dishes. This experiment was performed precisely by carrying out 15 repetitions for each test class of coffee. In analytical methods as well as machine learning, it is important to obtain a reliable number of repetitions to obtain reliable results.

The second step involved performing color analysis with the RGB (Red Green Blue) model, which is more popular in machine learning due to its digital solutions [37,38]. Data on the R, G, B components for the RGB model were extracted from digital images of each coffee research class. In order to extract the necessary information about the RGB color space model contained in the image, the available Python environment supported by the open source libraries Pandas, numPy and Python Image Library (PIL) [39] was used. The PIL library enabled processing and at the same time influenced image modification by acquiring the necessary image pattern, i.e., the extracted object in the form of coffee beans. In order to extract numerical data from the image, the mentioned numPy library was used to determine measures and statistics of the RGB model. The Pandas library allowed for the analysis of the aforementioned data extracted from the image showing them in a clear (readable, tabular order) manner, which significantly proper representation of numerical data emerges as a key aspect in data preprocessing.

2.3. Image Collection by Camera

In order to acquire digital images, a NIKON D5100 (Nikon Corp., Bangkok, Thailand) camera was used, which is characterized by high performance due to its resolution, the speed of capturing images and precise focusing. This camera was equipped with a 16.2-megapixel sensor, which fully ensured the acquisition of sufficient detail when taking digital photos with a coffee bean subject. In addition, this camera takes digital photos at a speed of about 4 frames/s, which also meets the expected criterion for using this model in the study. This camera, the NIKON D5100, was equipped with an AF system that allows for precise focusing. The application of vision devices in various aspects of research requires key image parameters, such as proper focus setting, matching ISO sensitivity value, exposure time selection, and maintaining conditions of the same environment within the repeatability of results. It is very common to encounter the problem of incorrect preparation of image acquisition parameters, which nowadays for machine learning can be a key aspect in the process of image recognition.

Image acquisition using the research and measurement station in the study was performed according to the procedure of the author’s solution (see Przybył K. et al. (2023)) [40,41,42]. In the process of image acquisition, the NIKON D5100 camera was equipped with a Nikkor Lens 28 mm. The selected lens is a wide-angle lenses, which gives the camera to make a wider field of view without losing the quality of the acquired image. The main criterion for the selection of the lens was to be able to take digital images at close range while preserving the quality of the coffee bean structure as well as its proportions. A series of 160 digital images depicting the selected type of coffee were taken during the study. Each of the 160 images contained a single object representing a coffee bean. Each coffee bean, in the process of image acquisition, needed to be rotated to image the top as well as the bottom part. These parts of the coffee bean are significantly different. This resulted in the preparation of 80 coffee beans for each research class.

With the purpose of maintaining the reproducibility of the results, the same image acquisition conditions were established for the coffee beans according to the exposure criterion and the established environment. Before taking digital images, the device was calibrated, and the image parameters were set, including setting the ISO sensitivity at 200, the aperture was set at f/6.3, the exposure time was 160 s, and the focal length was 28 mm. During the process of image acquisition, flash was not used due to the preservation of the conditions regarding the uniformity of illumination established by the measuring and research station according to Przybył K. et al. (2023) [40]. When taking the images, the white balance was set in manual mode in order to achieve the most accurate match of color to the designated environment.

As a result of image acquisition, 240 coffee beans were randomly sampled, i.e., 80 under-roasted coffee beans, 80 over-roasted coffee beans and 80 standard coffee beans (of appropriate quality). In effect, 480 digital images with a resolution of 4928 × 3264 (300 dpi) and 24-bit image depth saved in .JPG format were obtained.

2.4. Preprocessing Data

This step, in machine and deep learning, required preparing the data accordingly. Acquired digital photos with a resolution of 4928 × 3264 required cutting the background from the image. Using the Python environment, a catalog of unprocessed (original) photos was loaded in the coffee bean contained in each image. The rembg library in the Python environment was used. Nowadays, machine learning with Python using the rembg library allows for more precise background removal than previously used methods known to the author Przybył K., e.g., in C# programming. The rembg library is open source and supports various image file formats. In the research activity, 480 digital images in .JPG format were subjected to the process of cutting out the background and saving the files to .PNG format. The transformation of the image to .PNG file format allows for the lossless compression of the data without losing information during compression. Finally, .PNG files also support a transparency channel, allowing images with transparent areas (such as backgrounds) to be extracted, and are supported on multiple platforms.

In the next step, an image cropping process was carried out to create a square image with a lower resolution, i.e., 678 × 678 (secondary image). The reason for changing the image resolution in machine learning became limited computing power for the CNN learning process. Two sets of secondary images were prepared and divided into a learning set and a test set. The testing set defined 3 classes, i.e., 3 types of coffee beans (over-roasted, under-roasted and standard). In total, the testing set contained 480 digital images. In accordance with the principle of machine learning in the learning set, it was required to obtain images that would differ from the data contained in the testing set [43,44,45]. For this purpose, a point transformation of the image was carried out with steps of 90, 180 and 270 degrees. The result was the acquisition of 1440 digital images for the learning set. Remembering the criterion of over-fitting the model, the following ratio of sets was adopted: 75:25.

2.5. Building Neural Model by Python

This step required the design of a CNN model. A diagram of the model development is shown in Figure 2. A CNN was prepared in the Python environment, consisting of input layers, such as convolution layers (Conv2D), max pooling layers (MaxPooling) and a global averaging layer (GlobalaAveragePolling2D), and an output layer (Dense) with 3 neurons, representing the selected class of coffee beans. The Conv2D layers, as the base layers in the CNN, determined the patterns using filters to move the window (convolution window) by a defined step (stride) across the coffee bean image. The MaxPooling layers significantly extracted the most important representative features from the image, affecting the change in the dimension of the objects as well as the computational complexity. The GlobalAveragePolling2D layer was used to reduce the dimensionality of the data contained in the image while still maintaining relevant information. A digital image in the RGB color space model was reduced by calculating the average of all pixel values on a given channel. The reduction in the image information on each channel of the RGB color model, compared to fully connected, prevented, among other things, the overfitting of the model (overfitting) and so-called gradient fading, resulting in further deepening of the CNN for better performance. Dense layer (Dense) usually used for multiclassification, i.e., for a dataset such as coffee beans, where there are a minimum of 3 categorical classes, in this case, overdeveloped, underdeveloped and standard. The output neuron of the output layer is responsible for each class representing the mentioned type of coffee. In the classification issue, the Dense layer transforms the information compacted in the image into a set to predict their membership in certain classes.

2.6. Statistical Analysis

In the study, the homogeneity of coffee beans by research class was carried out using analysis of variance (ANOVA), and Tukey’s test [46] was applied to more accurately depict mean values between coffee classes using Statistica version 13.3 (Statsoft, Tulsa, OK, USA). Averages for each coffee class were calculated. The critical value of the Tukey test was set at a significance level of α = 0.05. The ranges of differences for each class of coffee bean were calculated and compared. As a result, statistical analysis made it possible to accurately assess significant differences between the coffee classes studied. This translated into the extraction of significant characteristics of each coffee class and the establishment of detailed homogeneity for these classes.

3. Results and Discussion

3.1. Color Analysis

In Table 1 and Table 2, the results of the L*a*b* and RGB color space models for each class of coffee are summarized. As a result of analysis by the Tukey test, the existence of subclasses for coffee beans was confirmed. Indeed, the test classes differ from each other [34]. In Table 1 on L*a*b* color, it was observed that statistically, the greatest difference in color was shown by the burnt coffee class against the other classes. Analyzing the L*, a* and b* components of the L*a*b* model [47,48], the under-roasted and standard classes showed a statistically significant similarity between them. In fact, these classes have similar values in the color attributes, which translates into difficulty in distinguishing them from each other. In the case of the over-roasted coffee class, even organoleptic differences in brightness (L*), red–green (a*) and yellow–blue (b*) can be effectively discerned [34].

On the other hand, Table 2, in which the results of the RGB color space model are presented [37], shows that the under-roasted coffee class shows a statistical significance in the greatest differences between the other classes. In the case of this model, there are clearly statistically three different groups that show differences between each other [34]. The knowledge that the RGB model is based on the light reflected in a fixed environment may prompt an appropriate representation of these colors for a food product in reality. In having an accurate representation of these colors using the RGB model for a food product, there is greater quality control and it leads to maintaining the authenticity of that product. Currently, techniques are being used to adulterate food products due to their color to meet consumer expectations and at the same time extend their shelf life [49,50]. The RGB color space model, together with machine learning, may soon become a key aspect for determining the quality grade and checking the authenticity of food products. As a result of image acquisition, the dependence on illumination has been established, and this allows for the adjustment of the visual presentation with the help of the RGB model, minimizing the differences between the digital image and the actual visual impression. Nowadays, through increasing consumer awareness, it is also required that a food product contain information as described for authenticity in the production and the distribution of products [21,51]. By the same means, the RGB color space model, by distinguishing more precisely between different classes of coffee beans than the L*a*b* model, can be of greater benefit in assessing the quality of these products.

3.2. The Design and Learning Process of Convolutional Networks

While designing the CNN model, the architecture of the network was determined, with which it was possible to perform the intended task, i.e., the classification of coffee beans, taking into account the mentioned categorical classes. The CNN used a sequential model (Sequential model), with which the number of individual layers was determined. The advantage of the chosen model is its linear flow between layers in convolutional networks [52,53]. Similarly, to the Multilayer Perceptron model, it is the most common and relatively simple type of neural model in machine learning [52], in which there is no feedback, and there is a relatively simple action for adding layers in the CNN structure.

The designed Sequential model contained eight input layers, including four convolution and four MaxPooling layers, one global average layer and one output layer. The convolution window in the convolutional layers was shifted by two units instead of one, which resulted in a faster dimensionality reduction in large tensors in the network layers. After a series of convolution and reduction operations, the feature tensor was transformed to flat link to the dense layer for the final classifier [29,54]. For the output layer, a sigmoid function was used, as it gave better results than the more commonly used softmax function in convolutional networks [39]. For the input layers, the Rectified Linear Unit (ReLU) function was used, which has emerged as the more effective activation function in this issue as in other studies [55]. This is due to the fact that the ReLU algorithm relative to other functions, including sigmoidal and hyperbolic tangent (tanh), prevents gradient from fading when learning convolutional models as well as other types of neural networks.

In the problem of minimizing the loss function during model learning, the Adam algorithm was used with a loss minimization of 0.001. The Adam learning algorithm as well as others [56,57], including RMSprop [58,59], Adadelta and Nadam [60,61], are based on performing network parameter updates [62]. The process of updating the network parameters is based on scaling by the square roots of the squares of the moving averages of the previously quoted gradient values at the point under study. These algorithms are constantly being improved by, among other things, giving them “long-term memory” against the gradient values that characterized the trained model in previous learning epochs [63].

The final step required determining model fit using epoch selection. The most efficient learning of the CNN model finally determined 55 epochs. The results of the learning are shown in Table 3.

In the classification process of various research aspects based on bitmap images, the use of convolutional artificial neural networks made it possible to achieve a matching power coefficient in the range above 0.7. A matching power coefficient value below 0.7 would mean that there were discrepancies between the results of the designed convolutional networks and the pattern results for these networks (targets), i.e., they were not fully satisfactory at the specified stage of research. A quality matching coefficient of less than 0.6 would mean that the matching of the convolutional networks is insufficient for the CNNs to be able to independently decide coffee class recognition with sufficient accuracy. In Table 3, the model obtained was a CNN with a good fitting ability of R² = 0.81 (Figure 3a). It means that this model satisfied the criterion of strength and the quality of neural network fitting for the indicated research question. In comparison, Mean Square Error (MSE) and Mean Absolute Error (MAE) were determined to assess the quality of the model in machine learning. In Figure 3b, the decreasing value of MSE indicates a more accurate prediction of the CNN model, which translates into optimal model fitting accuracy. The determined MAE value shown in Figure 3c for the training and test sets, respectively, showed greater robustness to outliers than the MSE.

Currently, it was accepted that the usage of deep learning models with multiple hidden layers provides more capabilities than a comparable model composed of applied sequentially added neurons in a single hidden layer in a shallow learning network would. The ability to learn representations and transfer learning to other tasks becomes a key aspect in deep learning, which speeds up the process of learning new sets. In machine learning, the criterion of preventing overfitting is important, which translates into the ability to generalize using methods such as dropout or batch normalization, among others [56,57]. In order to reduce (overfitting), the designed CNN model was also enriched with a dropout layer. A step of 0.5 was set for the Dropout layer, which translated into improved efficiency to generalize the model, giving a better result on the test set (Figure 3a–c).

In machine learning and deep learning, it was also important to use weight regularization, by means of which it becomes possible to reduce the complexity of the artificial neural network model by taking only low values of modulating variables [64,65]. This is obtained by adding a certain value called cost to the loss function of the network [64]. In this study, an L2 regularization was used for the CNN model, for which the cost value is proportional to the square of the values of the weighting coefficients. The penalty coefficient, for which a value of 0.01 was assumed, allowed for the countering the strength of the regularization of the weights of the learned model, resulting in better performance on the test set for the CNN model (Figure 4).

4. Conclusions

The statistical analysis by ANOVA for color helped distinguish exactly three classes of coffee beans in the RGB color space model, which translated into the efficient recognition of coffee defects obtained during the coffee roasting process. The L*a*b* color model demonstrated the difficulty in distinguishing between the quality classes of under-roasted and properly roasted coffee beans. Nevertheless, the L*a*b* model successfully distinguished between the class of over-roasted coffee. Analyzing the properties of coffee color parameters as an indicator of authenticity can make valuable contributions to food quality and safety.

More challenging was the classification based on digital images with a resolution of 678 × 678 by, among other things, the multidimensionality and complexity of the information contained in the acquired image as a result of image acquisition. The application of deep artificial neural networks made it possible to classify coffee bean samples using computer vision based on image pixels. As a result, an adequate CNN model was acquired, which was characterized by a high fit factor for the test set of 0.81.

The quality assessment of coffee on the basis of representative characteristics (in this case, color) significantly affects both producers and consumers. The effective evaluation of coffee quality will maintain a high product standard in the industrial coffee-roasting process. Consumers will be inclined to select and consume the right kind of coffee through taste and satisfaction with the obtained high-quality food product.

In the future, it is planned to use artificial intelligence methods to help effectively optimize the coffee quality evaluation process. Effectively modeling, operating a series of simulations, making predictions, optimizing, conducting quality control, detecting patterns and diagnosing through machine learning is becoming increasingly important in the food industry. The use of machine learning algorithms will help in the production of food products, leading to greater speed and efficiency, which will translate into minimizing losses and, at the same time, increasing sales.

Author Contributions

Conceptualization, M.G.-W. and K.P.; methodology, K.P. and M.G.-W.; software, K.P.; validation, K.P; formal analysis, P.B., R.R. and B.D.J.; resources, K.P.; data curation, K.P., M.G.-W. and M.G.; writing—original draft preparation, K.P.; writing—review and editing, K.P., M.G.-W., R.R. and A.S.; visualization, K.P.; supervision, M.G.-W. and K.P.; project administration, M.G.-W.; funding acquisition, M.G.-W. and K.P. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Ministry of Education and Science (Poznań, Poland), MEN-UPP 506.784.03.00/KMIP.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in Repository for Open Data—RepOD at DOI: https://doi.org/10.18150/0B46MT.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yu, Y.; Cao, Y.; Lai, Q.; Zhao, Q.; Sun, Z.; Zhou, S.; Song, D. Design and Operation Parameters of Vibrating Harvester for Coffea arabica L. Agriculture 2023, 13, 700. [Google Scholar] [CrossRef]
Batista, L.R.; Chalfoun de Souza, S.M.; Silva e Batista, C.F.; Schwan, R.F. Coffee: Types and Production. In Encyclopedia of Food and Health; Elsevier Ltd.: Amsterdam, The Netherlands, 2015. [Google Scholar]
Bolka, M.; Emire, S. Effects of Coffee Roasting Technologies on Cup Quality and Bioactive Compounds of Specialty Coffee Beans. Food Sci. Nutr. 2020, 8, 6120–6130. [Google Scholar] [CrossRef] [PubMed]
Rusinek, R.; Dobrzański, B.; Oniszczuk, A.; Gawrysiak-Witulska, M.; Siger, A.; Karami, H.; Ptaszyńska, A.A.; Żytek, A.; Kapela, K.; Gancarz, M. How to Identify Roast Defects in Coffee Beans Based on the Volatile Compound Profile. Molecules 2022, 27, 8530. [Google Scholar] [CrossRef] [PubMed]
Liczbiński, P.; Bukowska, B. Tea and Coffee Polyphenols and Their Biological Properties Based on the Latest in Vitro Investigations. Ind. Crops Prod. 2022, 175, 114265. [Google Scholar] [CrossRef]
Yesil, A.; Yilmaz, Y. Review Article: Coffee Consumption, the Metabolic Syndrome and Non-Alcoholic Fatty Liver Disease. Aliment. Pharmacol. Ther. 2013, 38, 1038–1044. [Google Scholar] [CrossRef]
Tsai, C.F.; Jioe, I.P.J. The Analysis of Chlorogenic Acid and Caffeine Content and Its Correlation with Coffee Bean Color under Different Roasting Degree and Sources of Coffee (Coffea arabica Typica). Processes 2021, 9, 2040. [Google Scholar] [CrossRef]
Mitiku, H.; Kim, T.Y.; Kang, H.; Apostolidis, E.; Lee, J.Y.; Kwon, Y.I. Selected Coffee (Coffea arabica L.) Extracts Inhibit Intestinal α-Glucosidases Activities in-Vitro and Postprandial Hyperglycemia in SD Rats. BMC Complement. Med. Ther. 2022, 22, 249. [Google Scholar] [CrossRef]
Ascherio, A.; Zhang, S.M.; Hernán, M.A.; Kawachi, I.; Colditz, G.A.; Speizer, F.E.; Willett, W.C. Prospective Study of Caffeine Consumption and Risk of Parkinson’s Disease in Men and Women. Ann. Neurol. 2001, 50, 56–63. [Google Scholar] [CrossRef]
Camandola, S.; Plick, N.; Mattson, M.P. Impact of Coffee and Cacao Purine Metabolites on Neuroplasticity and Neurodegenerative Disease. Neurochem. Res. 2019, 44, 214–227. [Google Scholar] [CrossRef]
Qi, H.; Li, S. Dose-Response Meta-Analysis on Coffee, Tea and Caffeine Consumption with Risk of Parkinson’s Disease. Geriatr. Gerontol. Int. 2014, 14, 430–439. [Google Scholar] [CrossRef]
Hu, G.; Bidel, S.; Jousilahti, P.; Antikainen, R.; Tuomilehto, J. Coffee and Tea Consumption and the Risk of Parkinson’s Disease. Mov. Disord. 2007, 22, 2242–2248. [Google Scholar] [CrossRef] [PubMed]
Król, K.; Gantner, M.; Tatarak, A.; Hallmann, E. The Content of Polyphenols in Coffee Beans as Roasting, Origin and Storage Effect. Eur. Food Res. Technol. 2020, 246, 33–39. [Google Scholar] [CrossRef]
Tajik, N.; Tajik, M.; Mack, I.; Enck, P. The Potential Effects of Chlorogenic Acid, the Main Phenolic Components in Coffee, on Health: A Comprehensive Review of the Literature. Eur. J. Nutr. 2017, 56, 2215–2244. [Google Scholar] [CrossRef] [PubMed]
Reis, C.E.G.; Dórea, J.G.; da Costa, T.H.M. Effects of Coffee Consumption on Glucose Metabolism: A Systematic Review of Clinical Trials. J. Tradit. Complement. Med. 2019, 9, 184–191. [Google Scholar] [CrossRef]
Wang, P.; Tseng, H.W.; Chen, T.C.; Hsia, C.H. Deep Convolutional Neural Network for Coffee Bean Inspection. Sens. Mater. 2021, 33, 2299–2310. [Google Scholar] [CrossRef]
Wesołowski, P.; Gawałek, J. Effect of the Conditions of Cereal Instant Coffee Granulation on the Product Yield and Quality. Przem. Chem. 2008, 87, 311–314. [Google Scholar]
Huang, N.F.; Chou, D.L.; Lee, C.A.; Wu, F.P.; Chuang, A.C.; Chen, Y.H.; Tsai, Y.C. Smart Agriculture: Real-Time Classification of Green Coffee Beans by Using a Convolutional Neural Network. IET Smart Cities 2020, 2, 167–172. [Google Scholar] [CrossRef]
Przybył, K.; Samborska, K.; Koszela, K.; Masewicz, L.; Pawlak, T. Artificial Neural Networks in the Evaluation of the Influence of the Type and Content of Carrier on Selected Quality Parameters of Spray Dried Raspberry Powders. Measurement 2021, 186, 110014. [Google Scholar] [CrossRef]
Przybył, K.; Walkowiak, K.; Jedlińska, A.; Samborska, K.; Masewicz, Ł.; Biegalski, J.; Pawlak, T.; Koszela, K. Fruit Powder Analysis Using Machine Learning Based on Color and FTIR-ATR Spectroscopy—Case Study: Blackcurrant Powders. Appl. Sci. 2023, 13, 9098. [Google Scholar] [CrossRef]
Dwiecki, K.; Przybył, K.; Dezor, D.; Bąkowska, E.; Rocha, S.M. Interactions of Oleanolic Acid, Apigenin, Rutin, Resveratrol and Ferulic Acid with Phosphatidylcholine Lipid Membranes—A Spectroscopic and Machine Learning Study. Appl. Sci. 2023, 13, 9362. [Google Scholar] [CrossRef]
Pandey, V.K.; Srivastava, S.; Dash, K.K.; Singh, R.; Mukarram, S.A.; Kovács, B.; Harsányi, E. Machine Learning Algorithms and Fundamentals as Emerging Safety Tools in Preservation of Fruits and Vegetables: A Review. Processes 2023, 11, 1720. [Google Scholar] [CrossRef]
Przybył, K.; Wawrzyniak, J.; Koszela, K.; Adamski, F.; Gawrysiak-Witulska, M. Application of Deep and Machine Learning Using Image Analysis to Detect Fungal Contamination of Rapeseed. Sensors 2020, 20, 7305. [Google Scholar] [CrossRef] [PubMed]
Sarker, I.H. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN Comput. Sci. 2021, 2, 420. [Google Scholar] [CrossRef] [PubMed]
Das, S. CNN Architectures: LeNet, AlexNet, VGG, GoogLeNet, ResNet and More. Medium, 16 November 2017. [Google Scholar]
Raschka, S.; Mirjalili, V. Python. Machine Learning. 2nd edition. book, e-book. Helion IT Bookstore. Available online: https://helion.pl/ksiazki/python-uczenie-maszynowe-wydanie-ii-sebastian-raschka-vahid-mirjalili,pythu2.htm#format/e (accessed on 11 July 2021).
Nelli, F. Deep Learning with TensorFlow. In Python Data Analytics; Apress: Berkeley, CA, USA, 2018; pp. 349–407. [Google Scholar]
Patterson, J.; Gibson, A. Deep Learning A Practitioner’s Approach; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2017. [Google Scholar]
Chollet, F. Deep Learning with Python, 2nd ed.; Simon and Schuster: New York, NY, USA, 2021. [Google Scholar]
Chollet, F. Deep Learning with Python. Manning Publications Co.: Shelter Island, NY, USA, 2018. [Google Scholar]
Chollet, F. Keras: The Python Deep Learning Library. 2015. Available online: https://keras.io/ (accessed on 24 August 2023).
Pinto, C.; Furukawa, J.; Fukai, H.; Tamura, S. Classification of Green Coffee Bean Images Basec on Defect Types Using Convolutional Neural Network (CNN). In Proceedings of the 2017 International Conference on Advanced Informatics, Concepts, Theory, and Applications (ICAICTA), Denpasar, Indonesia, 16–17 August 2017; pp. 1–5. [Google Scholar]
World Coffee Research. This Catalog Aims to Present Information for Varieties. Available online: https://varieties.worldcoffeeresearch.org/arabica/varieties (accessed on 21 September 2023).
Kulapichitr, F.; Borompichaichartkul, C.; Fang, M.; Suppavorasatit, I.; Cadwallader, K.R. Effect of Post-Harvest Drying Process on Chlorogenic Acids, Antioxidant Activities and CIE-Lab Color of Thai Arabica Green Coffee Beans. Food Chem. 2022, 366, 130504. [Google Scholar] [CrossRef]
Brühl, L.; Unbehend, G. Precise Color Communication by Determination of the Color of Vegetable Oils and Fats in the CIELAB 1976 (L*a*b*) Color Space. Eur. J. Lipid Sci. Technol. 2021, 123, 2000329. [Google Scholar] [CrossRef]
Price, T.D. Sensory Drive, Color, and Color Vision. Am. Nat. 2017, 190, 157–170. [Google Scholar] [CrossRef] [PubMed]
Wu, Y.; Mao, Y.; Feng, K.; Wei, D.; Song, L. Decoding of the Neural Representation of the Visual RGB Color Model. PeerJ Comput. Sci. 2023, 9, e1376. [Google Scholar] [CrossRef]
Kamiyama, M.; Taguchi, A. Color Conversion Formula with Saturation Correction from HSI Color Space to RGB Color Space. IEICE Trans. Fundam.Electron. Commun. Comput. Sci. 2021, 104, 1000–1005. [Google Scholar] [CrossRef]
Van der Walt, S.; Schönberger, J.L.; Nunez-Iglesias, J.; Boulogne, F.; Warner, J.D.; Yager, N.; Gouillart, E.; Yu, T. Scikit-Image: Image Processing in Python. PeerJ 2014, 2, e453. [Google Scholar] [CrossRef]
Przybył, K.; Gawałek, J.; Koszela, K. Application of Artificial Neural Network for the Quality-Based Classification of Spray-Dried Rhubarb Juice Powders. J. Food Sci. Technol. 2023, 60, 809–819. [Google Scholar] [CrossRef]
Al-Sammarraie, M.A.J.; Gierz, Ł.; Przybył, K.; Koszela, K.; Szychta, M.; Brzykcy, J.; Baranowska, H.M. Predicting Fruit’s Sweetness Using Artificial Intelligence—Case Study: Orange. Appl. Sci. 2022, 12, 8233. [Google Scholar] [CrossRef]
Przybył, K.; Ryniecki, A.; Niedbała, G.; Mueller, W.; Boniecki, P.; Zaborowicz, M.; Koszela, K.; Kujawa, S.; Kozłowski, R.J. Software Supporting Definition and Extraction of the Quality Parameters of Potatoes by Using Image Analysis. In Proceedings of the Eighth International Conference on Digital Image Processing (ICDIP 2016), Chengdu, China, 20–22 May 2016; Falco, C.M., Jiang, X., Eds.; 2016; Volume 10033, p. 100332L. [Google Scholar] [CrossRef]
Vithu, P.; Moses, J.A. Machine Vision System for Food Grain Quality Evaluation: A Review. Trends Food Sci. Technol. 2016, 56, 13–20. [Google Scholar] [CrossRef]
Carracedo-Reboredo, P.; Liñares-Blanco, J.; Rodríguez-Fernández, N.; Cedrón, F.; Novoa, F.J.; Carballal, A.; Maojo, V.; Pazos, A.; Fernandez-Lozano, C. A Review on Machine Learning Approaches and Trends in Drug Discovery. Comput. Struct. Biotechnol. J. 2021, 19, 4538–4558. [Google Scholar] [CrossRef] [PubMed]
Dara, S.; Dhamercherla, S.; Jadav, S.S.; Babu, C.M.; Ahsan, M.J. Machine Learning in Drug Discovery: A Review. Artif. Intell. Rev. 2022, 55, 1947–1999. [Google Scholar] [CrossRef]
Bland, J.M.; Altman, D.G. Statistics Notes: Multiple Significance Tests: The Bonferroni Method. BMJ 1995, 310, 170. [Google Scholar] [CrossRef]
De Lasarte, M.; Vilaseca, M.; Pujol, J.; Arjona, M. Color Measurements with Colorimetric and Multispectral Imaging Systems; Rosen, M.R., Imai, F.H., Tominaga, S., Eds.; SPIE: Bellingham, WA, USA, 2006; p. 60620F. [Google Scholar]
Przybył, K.; Masewicz, Ł.; Koszela, K.; Duda, A.; Szychta, M.; Gierz, Ł. An MLP Artificial Neural Network for Detection of the Degree of Saccharification of Arabic Gum Used as a Carrier Agent of Raspberry Powders. In Proceedings of the Thirteenth International Conference on Digital Image Processing (ICDIP 2021); Jiang, X., Fujita, H., Eds.; SPIE: Bellingham, WA, USA, 2021; Volume 11878, p. 93. [Google Scholar]
Spence, C. Background Colour & Its Impact on Food Perception & Behaviour. Food Qual. Prefer. 2018, 68, 156–166. [Google Scholar]
Spence, C. On the Changing Colour of Food & Drink. Int. J. Gastron. Food Sci. 2019, 17, 100161. [Google Scholar]
Dooley, D.M.; Griffiths, E.J.; Gosal, G.S.; Buttigieg, P.L.; Hoehndorf, R.; Lange, M.C.; Schriml, L.M.; Brinkman, F.S.L.; Hsiao, W.W.L. Food on: A Harmonized Food Ontology to Increase Global Food Traceability, Quality Control and Data Integration. Npj Sci Food 2018, 2, 23. [Google Scholar] [CrossRef]
Xu, X.; Zhu, L.; Zhuang, W.; Lu, L.; Yuan, P. A Convolution Neural Network Implemented by Three 3 × 3 Photonic Integrated Reconfigurable Linear Processors. Photonics 2022, 9, 80. [Google Scholar] [CrossRef]
Yang, J.; Yang, G. Modified Convolutional Neural Network Based on Dropout and the Stochastic Gradient Descent Optimizer. Algorithms 2018, 11, 28. [Google Scholar] [CrossRef]
Krohn, J. Deep Learning Illustrated: A Visual, Interactive Guide to Artificial Intelligence; Persion: London, UK, 2020. [Google Scholar]
Pan, S.Y.; Liu, M.X.; Forero-Romero, J.; Sabiu, C.G.; Li, Z.G.; Miao, H.T.; Li, X.D. Cosmological Parameter Estimation from Large-Scale Structure Deep Learning. Sci. China Phys. Mech. Astron. 2020, 63, 110412. [Google Scholar] [CrossRef]
Yang, G.; Yang, J.; Li, S.; Hu, J. Modified CNN Algorithm Based on Dropout and ADAM Optimizer. J. Huazhong Univ. Sci. Technol. (Nat. Sci. Ed.) 2018, 46, 122–127. [Google Scholar] [CrossRef]
Ogundokun, R.O.; Maskeliunas, R.; Misra, S.; Damaševičius, R. Improved CNN Based on Batch Normalization and Adam Optimizer. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer International Publishing: Cham, Switzerland, 2022; Volume 13381, pp. 593–604. [Google Scholar]
Liu, J.; Xu, D.; Zhang, H.; Mandic, D. On Hyper-Parameter Selection for Guaranteed Convergence of RMSProp. Cogn. Neurodynamics 2022. [Google Scholar] [CrossRef]
Zou, F.; Shen, L.; Jie, Z.; Zhang, W.; Liu, W. A Sufficient Condition for Convergences of Adam and RMSProp. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; Volume 2019, pp. 11119–11127. [Google Scholar]
Irfan, D.; Rosnelly, R.; Wahyuni, M.; Samudra, J.T.; Rangga, A. Perbandingan Optimasi Sgd, Adadelta, Dan Adam Dalam Klasifikasi Hydrangea Menggunakan Cnn. J. Sci. Soc. Res. 2022, 5, 244. [Google Scholar] [CrossRef]
Yang, J.; Bagavathiannan, M.; Wang, Y.; Chen, Y.; Yu, J. A Comparative Evaluation of Convolutional Neural Networks, Training Image Sizes, and Deep Learning Optimizers for Weed Detection in Alfalfa. Weed Technol. 2022, 36, 512–522. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J.L. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015—Conference Track Proceedings 2014, San Diego, CA, USA, 7–9 May 2015. [Google Scholar] [CrossRef]
Reddi, S.J.; Kale, S.; Kumar, S. On the Convergence of Adam and Beyond. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018—Conference Track Proceedings, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
An Overview of Regularization Techniques in Deep Learning (with Python Code). Available online: https://www.analyticsvidhya.com/blog/2018/04/fundamentals-deep-learning-regularization-techniques/ (accessed on 23 September 2023).
Michelucci, U. Applied Deep Learning: A Case-Based Approach to Understanding Deep Neural Networks; Apress Media, LLC: New York, NY, USA, 2018. [Google Scholar]

Figure 1. Examples of samples of selected coffee beans, considering the quality status: (a) standard coffee, (b) overdeveloped coffee, (c) underdeveloped coffee.

Figure 2. A procedure scheme for preparing convolutional neural networks.

Figure 3. Curves for evaluating the quality of the CNN model considering the accuracy index (a), MSE (b) and MAE (c) for the test and training set.

Figure 4. CNN model structure includes input data (obtained from image acquisition and processed as a result of image processing) and different types of layers, such as four convolutional layers (CONV2D), four layers of Max-pooling (MAXPOOLING), one dense layer with regularization (DENSE + regularization), one dropout layer (DROPOUT), one global average pooling layer, and describes the output size of the dense layer (DENSE).

Table 1. Mean of values of L*a*b* color of coffee bean.

Class of Coffee	L*	a*	b*
underdeveloped	18.07 ± 2.03 ^b	7.86 ± 0.51 ^b	10.83 ± 0.84 ^b
overdeveloped	15.38 ± 2.31 ^a	5.86 ± 0.58 ^a	6.44 ± 1.16 ^a
standard	20.13 ± 3.69 ^b	8.13 ± 0.75 ^b	11.81 ± 1.61 ^b

a,b: The differences between mean values with the same letter in columns were statistically insignificant (p < 0.05).

Table 2. Mean of values of RGB color of coffee bean.

Class of Coffee	R	G	B
underdeveloped	0.33 ± 0.06 ^a	0.24 ± 0.05 ^a	0.16 ± 0.03 ^c
overdeveloped	0.42 ± 0.09 ^c	0.27 ± 0.05 ^c	0.18 ± 0.04 ^b
standard	0.46 ± 0.08 ^b	0.30 ± 0.05 ^b	0.19 ± 0.03 ^a

a–c: The differences between mean values with the same letter in columns were statistically insignificant (p < 0.05).

Table 3. CNN’s learning results.

Type	Testing Set			Training Set
Type	MSE	MAE	R²	MSE	MAE	R²	Activation Hidden	Training Algorithm
CNN	0.1739	0.3238	0.8125	0.1941	0.3344	0.7312	ReLU	Adam

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Przybył, K.; Gawrysiak-Witulska, M.; Bielska, P.; Rusinek, R.; Gancarz, M.; Dobrzański, B., Jr.; Siger, A. Application of Machine Learning to Assess the Quality of Food Products—Case Study: Coffee Bean. Appl. Sci. 2023, 13, 10786. https://doi.org/10.3390/app131910786

AMA Style

Przybył K, Gawrysiak-Witulska M, Bielska P, Rusinek R, Gancarz M, Dobrzański B Jr., Siger A. Application of Machine Learning to Assess the Quality of Food Products—Case Study: Coffee Bean. Applied Sciences. 2023; 13(19):10786. https://doi.org/10.3390/app131910786

Chicago/Turabian Style

Przybył, Krzysztof, Marzena Gawrysiak-Witulska, Paulina Bielska, Robert Rusinek, Marek Gancarz, Bohdan Dobrzański, Jr., and Aleksander Siger. 2023. "Application of Machine Learning to Assess the Quality of Food Products—Case Study: Coffee Bean" Applied Sciences 13, no. 19: 10786. https://doi.org/10.3390/app131910786

APA Style

Przybył, K., Gawrysiak-Witulska, M., Bielska, P., Rusinek, R., Gancarz, M., Dobrzański, B., Jr., & Siger, A. (2023). Application of Machine Learning to Assess the Quality of Food Products—Case Study: Coffee Bean. Applied Sciences, 13(19), 10786. https://doi.org/10.3390/app131910786

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Application of Machine Learning to Assess the Quality of Food Products—Case Study: Coffee Bean

Abstract

1. Introduction

2. Materials and Methods

2.1. Samples

2.2. Color Analysis

2.3. Image Collection by Camera

2.4. Preprocessing Data

2.5. Building Neural Model by Python

2.6. Statistical Analysis

3. Results and Discussion

3.1. Color Analysis

3.2. The Design and Learning Process of Convolutional Networks

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI