1. Introduction
Coffee is a brewed beverage obtained by extracting in water the soluble components of the roasted pits of green coffee beans, which is not only flavorful but contains various levels of antioxidants and nutrients. Inoue et al. [
1] conducted a prospective study with 90,452 subjects (including 43,109 men and 47,343 women) and found that hepatitis B and C virus-positive patients who consume one to two cups of coffee per day have a lower cirrhosis risk (relative risk = 50%) compared with that of those who almost never consumed coffee. Additionally, hepatitis B and C virus-positive patients who consume four cups of coffee per day have a lower cirrhosis and hepatitis risk (relative risk = 25%) compared with that of those who almost never consumed coffee. According to Loftfield et al. [
2], drinking coffee is beneficial for heart health as caffeine can improve the cell processes in the blood vessels, especially the proteins in the cells of the elderly. When consumed in moderation, coffee can help prevent diseases like liver cancer, heart disease, dementia, or stroke. Hence, coffee has been one of the most widely consumed beverages in the world.
After collection, coffee beans should be handled quickly; otherwise, they will smell bad. Mature fruits will sink to the bottom of the tank after sun exposure and washing, while immature or broken beans tend to float on top. Green coffee beans are then picked out of the coffee fruit. Regardless of previous processing and selection, to remove defective beans, green coffee beans need to be manually selected; the selection process is a difficult one due to the brewing, mildew, brokerage, or worm damage during the peeling of beans. If defective beans are present before baking, chemical reactions may occur due to uneven heating, resulting in chemical toxins which can be harmful to health [
3]. Therefore, to reduce labor costs, improve the quality of coffee, and increase profits, artificial intelligence (AI) has been introduced to pick out defective beans. The accurate and fast selection of defective beans using AI is an important automatic detection technology.
Many different algorithms have been applied to detect the quality of green coffee beans. For example, Santos et al. [
4] employed spectroscopy to analyze the correlation between the quality of green coffee beans and near-infrared rays with a partial least square regression (PLS) model to predict defective coffee beans. However, the cost of detection instruments was too expensive; thus, it would be difficult to mass-produce. Oliveira et al. [
5] put green coffee beans into a dark box without external light, captured the image through high-pixel RGB cameras, and converted RGB space colors to CIELAB. Later, they used a Bayesian classifier to improve coffee quality prediction. However, the test required a special environment, and the instrument cost was high. Arboleda et al. [
6] extracted coffee bean features such as area of the bean, perimeter, equivalent diameter, and percentage of roundness, and employed an artificial neural network (ANN) and K nearest neighbor (KNN) to automatically categorize the coffee beans. Using ANN, the classification achieved score was 96.66%, while the classification score using KNN was at 84.12% [
7]. Image-processing techniques were used to control coffee bean quality by extracting RGB color components based on 105 images of green coffee beans and 75 images of black beans with high accuracy. However, only 180 green coffee beans were tested [
6,
7]; this may cause poor stability during mass production due to the limited amount during testing. Hence, scholars began to improve stability through deep learning (DL). Pinto et al. [
8] developed a convolutional neural network (CNN) that classified 13,000 green coffee beans images into six defect types. They sorted defective beans from 72.5% (broken bean) to 98.7% (black bean) accuracies, and the difference in the accuracy rate of detection in defective beans led to poor generalization of the model. Wang et al. [
9] used a lightweight model with knowledge distillation (KD) and improved the accuracy of the training method to 91%, with the model parameters at only 256, 779. Huang et al. [
10] extracted 1000 pleasant coffee beans and 1000 defective coffee beans and used image processing and data augmentation to deal with the data. Next, they applied YoloV3 to divide good and bad beans which had a recognition rate of 94.63%. Yang et al. [
11] employed the CNN model based on KD, spatial-wise attention module (SAM), and SpinalNet [
12] to achieve an accuracy rate of 96.54% on the F1 score. The recent progress in AI technology enables the labeling of data and the use of neural network design to allow the machine to automatically learn the data and the neural network to predict and make decisions based on the characteristics of the learned data. Although a deep convolutional neural network (DCNN) can accurately classify the images, it cannot be easily applied in embedded systems because its large Giga floating-point operations per second (GFLOPS) is the most popular method for model compression.
Quantization [
13] and pruning technology are the most popular methods for model compression. Quantization converts the weight of a floating-point type into an integer type to reduce the amount of model parameters and computing while maintaining the accuracy of the model. In [
14], a model architecture design that reduces the amount of model parameters and computation and also uses quantization technology and has a good compression effect while maintaining the accuracy of the model is proposed. However, there are many pruning-related methods, but they are rarely compared with other methods, or these methods have better results depending on the dataset. Therefore, in [
15], the effect of different pruning on past research data is measured, as results show that pruning can sometimes improve the accuracy but the effect is usually not more stable than when using a better model architecture. Tang et al. [
16] proposed an automatic pruning method, which does not need to set pruning standards and parameters. It is suitable for various neural network architectures and compares with other pruning methods that can effectively improve the compression ratio and reduce FLOPs. In [
17], a network pruning method that can identify structural redundancy in CNNs and prune filters in selected layers with the most redundancy is proposed.
Generally, accuracy or evaluation indicators are very important for model efficiency when using DCNN for prediction and decision-making. Furthermore, explainable AI (XAI) will be lacking when DL technology is utilized. Hence, when using DL in models with high accuracy, the complexity may be high, while the correlation and hidden information cannot be explained through accuracy. Thus, scholars usually have difficulty understanding the correlation between input and output and how the model achieves its purpose. As a result, the uncertainty remains if people blindly believe in the model’s prediction. To solve this problem, XAI should be introduced to determine whether the prediction and evaluation of the decision making of the model based on certain characteristics are reasonable. Only in this way can the reliability of the model and quality detection be ensured in the future.
To deal with the above issues, this paper proposes a lightweight deep convolutional neural network (LDCNN) to detect the quality of green coffee beans. First, the features of defective coffee beans through RGB images were extracted, and the model was employed to classify the beans. Next, rectified Adam (RA), lookahead (LA), and gradient centralization (GC) were utilized to train the optimization methods to improve the accuracy of the model, enabling the model to operate in embedded systems.
4. Experimental Result
In this, we would like to introduce comparisons of the proposed model with previous related models. The green coffee bean dataset provided by a small optical sorter [
29] included 4626 images that were 400 × 400 pixels. It consisted of 2149 good images and 2477 bad images; sample images are shown in
Figure 13. The author collected the images of green coffee beans through high-speed cameras and conveyor belts and reduced the brightness of the shooting environment to minimize the shadows and centralize each image.
Figure 13a,b shows good and bad images, respectively, while
Figure 13c,d is images with adjusted brightness. In addition, the computer used for simulations was an AMD Ryzen™ 7 4800H CPU and GeForce GTX 1660 Ti GPU with 16GB DDR4 RAM. We implemented this research by using Python 3.8.11 and Pytorch 1.7.1, Pytorchvision 0.8.2, cudatookit 10.1, OpenCV 4.5.4 Library on a Windows 10 OS.
To achieve DA (including horizontal and vertical turning and 180° rotation) without any changes in shape, color, and background of images, the number of images was expanded from 4626 to 18,504 as the training dataset of this study. The image input model after DA helped improve the accuracy and generalization of the model.
4.1. The Influence of Image Normalization on Model
Image normalization means scaling the values of original images within an interval to extract the features and avoid the influence of abnormal values on the training results of the model. In this paper, an Adam optimizer was used, the learning rate was set at 0.001, batch size was 16, and 100 epochs were trained. Using the same validation method, we figured out that image normalization had significantly improved the accuracy of LDCNN as shown in
Table 2.
4.2. Ablation Study of Training and Optimization in the Model
Cross-validation was employed to obtain evaluation indicators using the training dataset. As indicated in
Table 3, the accuracy rate of the model was 98.38%, the precision rate was 98.60%, the recall rate was 97.89%, and the F1 score was 98.24%. The parameters were 149,842, the model size was 0.57 MB, GFLOPS was 0.05, and the computing time was 10.08 ms (see
Table 4). The computing time is the average time of image preprocessing and model prediction. As shown, the various evaluation indicators of the model were of satisfactory accuracy while the model was kept lightweight. Afterwards, the training and optimization methods were employed to carry out the ablation study. Adam, cross-entropy loss function, and learning rate of 0.001 were used to evaluate and analyze the RA, LA, and GC methods.
Based on the experiment results, compared to when the Adam optimizer was used, the accuracy rate improved by 0.14%, the precision rate increased to 99.09%, and the recall rate was reduced by 1.80% when RA was employed. Therefore, the generalization ability of the RA model was low. When using GC, evaluation indicators of the model improved compared with those without optimization, indicating that GC effectively improved the detection rate of the model. On the contrary, when using LA, all the evaluation indicators were lower than those without optimization. Moreover, the evaluation indicators of LA-GC were lower than those of GC. While using the three methods simultaneously, the accuracy rate reached 98.38%, and F1 score achieved 98.24%. According to the results in
Table 3, the Adam optimizer was not suitable for training with LA. Only when both RA and LA were used could the model’s accuracy be improved. When GC was the only one used, the recall index was the highest. Therefore, the three optimization methods to train the model simultaneously could obtain excellent stability and generalization.
Figure 14a,b shows the training process of LDCNN and the training process using three optimization and training methods, respectively. No obvious underfitting or overfitting occurred during both training processes. However, the accuracy sometimes dropped suddenly during the training, as the model generated random predictions due to rapid convergence and insufficient parameters. After adding the optimization method, the training stability significantly improved. Hence, the convergence should be stable to improve the accuracy, showing that the model could significantly improve the stability and generalization after combining the training method.
4.3. Evaluation Results of Interpretable Model
Figure 15a represents the original image of green coffee beans, and the upper and lower parts are good and bad coffee beans, respectively.
Figure 15b,c is the visualization of the interpretation model prediction when the LDCNN optimization model went through LIME. The green block of coffee beans is the area favorable to the prediction results, and the red block is the unfavorable area. Based on
Figure 15b, after XAI, the distribution of green areas may also exist in the surroundings of the coffee beans. It shows that the model took the background of coffee beans as the basis for judgment when predicting the quality of beans, and there was no obvious area that could be considered as the reference for judgment.
Figure 15c, demonstrates that the favorable and unfavorable areas of beans could be revealed by predicting the area of the coffee beans.
To conclude, the prediction of the LDCNN optimization model was reliable, and the impact of image normalization on model training could also be understood after the image was visualized through LIME. Hence, the training could be optimized, or the abnormal data could be screened out from the dataset so the model could get better accuracy during the training process.
4.4. Comparison of Model Efficiency & Embedded System
To compare the models and training methods proposed in this paper, this experiment chose famous models, including ResNet [
18], MobileNetV3 [
19], EfficientNetV2 [
30], and ShuffleNetV2 [
31], to evaluate and compare with LDCNN. In this experiment, the same dataset was used to train each model. The Adam optimizer was employed, learning rate was set to 0.001, batch size was 16, and 100 epochs were trained. This study used the public dataset of green coffee beans, because most of the related research uses private datasets and is not public. In this work, the evaluation indicators of each model were tested, including accuracy, precision, recall, F1 score, parameter, model size, GFLOPs, and evaluation time, as shown in
Table 4. We used the F1 score divided by eval time for evaluation. If the value was higher, the proposed model was the best in the green coffee bean identification task, as shown in
Figure 16. As suggested in
Table 4, in the quality detection of green coffee beans, the accuracy rate was better when LDCNN was used than when other models were utilized. However, the accuracy rate was quite low with the ResNet model compared with that with the other models; ResNet18 had an accuracy of only 89.66%, and ResNet 34 and ResNet 50 had decreased accuracy due to overfitting under increased CL.
Based on relevant research of public datasets [
29], a lightweight model was proposed by Wang et al. [
9] in which the ResNet18 model was trained as a teacher model through knowledge distillation (KD) to train the lightweight model. The accuracy rate of the lightweight model reached up to 91% with parameters of 256,779. As illustrated in
Table 5, Yang et al. [
11] put forward DSC, SAM, SpinalNet, and KD methods to train the model when the F1 score achieved 96.54%. Compared with LDCNN, the previous model [
9] had higher accuracy and lower parameters since the latter took ResNet18 as the teacher model for training. Nevertheless, ResNet18 in this experiment was not the optimal model, resulting in a low accuracy rate of the lightweight model. In contrast to Yang et al. [
11], the precision of LDCNN increased by 2.12%, recall was raised by 0.36%, and F1 score gained 1.74%. Finally, the LDCNN was placed on Raspberry Pi 4B to execute the green coffee bean quality detection system (see
Table 6). The evaluation time included model building, image preprocessing, and image estimation time, which showed that LDCNN could achieve the task of real-time detection on the embedded system. The model was performed using Python programming language on Raspberry Pi 4B. The experiment used a 3701 verification dataset, and evaluation time was the time it took the model to predict 3701 images (including the preprocessing time). A total of 1226.57 s was used during the experiment when hardware acceleration method was not used. However, the image size of the verification dataset had been converted to 224 × 224 × 3 in numpy array format, so that the model could be executed on limited memory, and the implementation of the model on the embedded system.
5. Conclusions and Future Work
In this study, a new model for quality detection of green coffee beans, a LDCNN, was proposed, which combined DSC, SE block, skip block, and other frameworks, as well as HS and ReLU activation functions, to make the model lightweight and efficient. To improve the performance and training stability, RA, LA, and GC models were combined to avoid random prediction caused by the lightweight model. Based on the experimental results, compared with those of other state-of-the-art models, our model could achieve a higher accuracy rate of 98.38% and an F1 score of 98.24% in the quality detection of green coffee beans, indicating excellent detection performance. When the model was placed in the embedded system, the average speed reached up to 3.02 FPS. Finally, the LIME interpretable model was used to verify that the model in this work was reliable, indicating that the impact of image preprocessing on the model after the image was interpretable and could be understood to optimize the training of the model or screen the abnormal data in the dataset. Hence, the accuracy and generalization of the model could be improved during the training.
This work used a public dataset for verification, which predicted and classified the quality of green coffee beans. However, there are more than ten types of defective green coffee beans, and the types of defective beans can be classified in the future. Since coffee beans are classified via different screening methods, their color, shape, and size will be different, which can also improve the generalization of the model. In addition, with AI research focusing on edge computing and XAI in recent years, XAI techniques have been developed to improve the explainability of models, such that their output can be better understood. It can be imported into different industries, and the model can be judged with reliability. Therefore, we can focus on the development of efficient XAI algorithms to increase academic research and industrial application in the future work.
Finally, ochratoxin A is a mycotoxin produced by mold. The contamination cases found in coffee beans in the past were all caused by mold caused by the drying process of coffee beans in harvesting and ochratoxin [
37]. Therefore, the government should conduct market monitoring to strengthen food management. In addition, it is also necessary to pay attention to whether coffee-related products are mandatory to label with the category of caffeine and compliance with limit standards such as microorganisms, heavy metals, and pesticide residues. Global standards are based on reference to the background value of coffee products in various countries and the risk assessment results of people’s dietary exposure and consider that people will not be caused health hazards due to excessive intake of ochratoxins under the condition of normal coffee drinking. Finally, this work proposes an AI computer vision detection system for coffee beans. In the future, from government policies into the food management mechanism, it can achieve an objective control mechanism for people’s dietary health.