Next Article in Journal
An Image Secret Sharing Method Based on Matrix Theory
Previous Article in Journal
Complexity, Regularity and Non-Linear Behavior in Human Eye Movements: Analyzing the Dynamics of Gaze in Virtual Sailing Programs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Advanced Classification of Coffee Beans with Fatty Acids Profiling to Block Information Loss

1
Department of Mechatronic Engineering, Huafan University, New Taipei 223, Taiwan
2
Tea Science Department, Zhejiang University, Hangzhou 310058, China
3
Department of Biotechnology, Mingchuan University, Taoyuan 333, Taiwan
*
Author to whom correspondence should be addressed.
Symmetry 2018, 10(10), 529; https://doi.org/10.3390/sym10100529
Submission received: 7 September 2018 / Revised: 17 October 2018 / Accepted: 18 October 2018 / Published: 22 October 2018

Abstract

:
Classification is a kernel process in the standardization, grading, and sensory aspects of coffee industries. The chemometric data of fatty acids and crude fat are used to characterize the varieties of coffee. Two category classifiers were used to distinguish the species and roasting degree of coffee beans. However, the fatty acid profiling with normalized data gave a bad discriminant result in the classification study with mixed dimensions in species and roasted degree. The result of the predictive model is in conflict with the context of human cognition, since roasted coffee beans are easily visually distinguished from green coffee beans. By exploring the effects of error analysis and information processing technologies, the lost information was identified as a bias–variance tradeoff derived from the percentile normalization. The roasting degree as extensive information was attenuated by the percentile normalization, but the cultivars as intensive information were enhanced. An informational spiking technique is proposed to patch the dataset and block the information loss. The identified blocking of informational loss could be available for multidimensional classification systems based on the chemometric data.

Graphical Abstract

1. Introduction

Various classification techniques are widely used in the identification of cultivars or species, as well as in the standardization and the grading of products for commercial and agricultural production [1,2,3]. Classification is also a kernel process for accurate decision-making after measurements in observation, survey, clinical diagnosis, and industrial quality management [4,5,6,7].
Green coffee is one of the most traded agricultural commodities in the world. The species of commercial coffee consist almost entirely of Coffea arabica (Arabica) and Coffea canephora (Robusta). Arabica is generally more prominent and expensive in the market [8].
Green beans of both species can be distinguished by featured appearances and different compositions that affect the sensory qualities of coffee products [5]. However, most commercial roasted and ground coffees are actually blends of the two species. The molecular genetics approach was applied to differentiate two coffee species in green beans for the quantification of any adulteration of Arabica with Robusta beans [9]. After roasting and grinding, more advanced analytical methods are required as indicators of subtle differentiation between the coffee species [10] because these biological features would be diminished after roasting at high temperature (>200 °C) [11].
Within this realm, several works have successfully distinguished the coffee varieties by using their chemometric data, such as amino acids, metals, sucrose, organic acids, and sterols [6,12,13]. However, acquisition of measured data should be readily available, assured, and inexpensive for a good predictive model. Otherwise, a good predictor derived from measured data must be associated with sensory evaluations [14,15,16,17]. The sensory descriptors could also be established using regression approaches based on the chemometric data [3,10,18]. Fatty acid profiling is most often evaluated to achieve discrimination among the varieties of coffee beans because the sensory qualities of coffee are complicated and affected by multiple factors [19].
Different types of compositional data have been applied to characterize green beans (cultivars as a nominal variable) or investigate the roasting degree of coffees (in a ratio scale). The first two principal components of visible micro-Raman spectra reveal different chlorogenic acid and lipid compositions when comparing Arabica and Robusta green coffee [20]. Dong et al. reported the effect of different drying techniques on the molecular composition of green Robusta [21]. Wei et al. used an NMR-based prediction model to evaluate roasted coffee bean extracts [22]. Han et al. and Frank et al. used specific chemical compounds to assess the toxic risk [23] and bitter taste [24] in roasted coffees, respectively. Romano et al. used the specific fatty acids ratio to determine the relative amounts of Arabica and Robusta in a green coffee blend [1]. Martin et al. obtained a classification result with residual errors for green and roasted Arabica and Robusta coffees by using linear discriminant analysis [25]. Recently, Dias and Benassi proposed a two-step discrimination among coffee species and roasted degrees carried out using heat-labile compounds [11]. All of these studies demonstrate that multidimensional discrimination would be a challenging task in classification.
As shown in Figure 1, chemometric protocols applied to the fatty acid composition data of specimens provide an approach to extract information on coffee quality. In this study, a discriminant system was developed with a learning model to achieve predictive functions. Two linear classifiers—LCRG (roasted, green) and LCAR (Arabica, Robusta)—are used to establish four independent groups. Thus, any one specimen (Si) can be placed into one of the groups, as the logic expression Si {(Roasted ∪ Green) ∩ (Arabica ∪ Robusta)} indicates. The performances of the classifiers with chemometric data were evaluated and validated by their correctness.
However, information loss causing mislabeling was found when the reliability of the data processing was evaluated. The LCRG operator has poorer accuracy than LCAR, showing that the result of the prediction model is in conflict with the context of human cognition, since roasted coffee beans are easily distinguished from the green ones by their brown color. A similar bias–variance dilemma was also observed in the early classification study [25]. The bias–variance tradeoff has also been applied to explain the effectiveness of heuristics in human learning, even if it is a problem in supervised learning.
As technology progresses, classifications are used across every discipline, and the data structures are evolving into a more complex form [26,27]. In this study, the source of the information loss was identified as an obvious pattern of classification errors derived from percentile normalization. Further, the accuracy of the classification system would be successfully enhanced by patching of the breach using other featured data with the same properties as the lost information, as shown in Figure 2.
The use of regression analysis aims to find independent latent variables for advanced classification. Simultaneously, some leaks would be produced by the structural normalization of the dataset. Thus, the data integrity and quality must be considered in a preprocessing phase before extracting knowledge from raw data [28]. The preprocessing phase takes over half of the knowledge discovery process. Our study demonstrates informational extraction achieved based on the patching of data structures in a multimodal classification.

2. Materials and Methods

2.1. Sample Collection and Preparation

Green coffee beans of Arabica and Robusta cultivars were purchased from coffee suppliers who guaranteed the origins and were verified by our experts. Portions of green beans were roasted and collected for further analysis and cupping with reliable and traceable filing.
The roasting and grinding levels of these coffee beans were arbitrary and without specific requirements. We expect that the samples were similar to those obtained in daily life. All of the coffee beans, including green and roasted beans, were stored under steady conditions to avoid oxidation or compositional changes. Then, 200 g of each portion of ground coffee beans (powdered) was sampled and labelled as a specimen in this study.

2.2. Lipid Extraction and Crude Fat

The Soxhlet solid–liquid extraction method [29] (Association of Official Analytical Chemists (AOAC) Official Method 2003.05/920.39) was used to extract the lipid fraction from the ground coffee beans. All of the glass apparatus were rinsed using petroleum ether and dried in an oven at 102 °C. Ten grams of ground coffee sample were weighed and placed in the thimble. A quantity of 90 mL of petroleum ether was placed in a 150 mL round-bottom flask. We continued the extraction process for 5 h, and a defatted residue was obtained after distillation. Almost all the solvent was collected and placed in the oven and then removed using a desiccator. The weight of the sample was then noted. As a result, the crude fat (%) = (W − T)/S × 100% was calculated, where W, T, and S are the weights of the thimble with ether extract, the empty thimble, and the sample, respectively.

2.3. Preparation of Fatty Acid Methyl Esters

Fatty acid methyl esters (FAMEs) were prepared by a method modified from the IUPAC standard method [30,31]. Briefly, 200 mg of crude fat (lipid extraction) in a screw-capped glass tube was hydrolyzed with 1 mL of 1 M KOH in 70% ethanol (Sigma–Aldrich, St. Louis, MO, USA) at 90 °C for 1 h. The reaction mixture was acidified with 0.2 mL of 6 M HCl, and then 1 mL of water was added. The free fatty acids (FAs) were extracted with n-hexane to be methylated with 1 mL of 10% BF3 in methanol at 37 °C for 20 min. A quantity of 3 mL of 6% potassium carbonate solution was added to the solution, and then FAMEs were extracted with 1 mL of hexane. Of the n-hexane top layer, 200 μL was transferred into a vial and crimped.

2.4. Fatty Acids Profile by GC–FID Analysis

The FAMEs were determined using gas chromatography (TRACE GC Ultra, Thermo Fisher Scientific, Rodano-Milan, Italy) equipped with a flame ionization detector (FID) and liquid auto-injector (AI-3000, Thermo Fisher Scientific, Rodano-Milan, Italy). Separation was carried out in an Rtx-WAX capillary column (60 m × 0.53 mm id × 1 μm, Resteck Corporation, Bellefonte, PA, USA). Injection volume was 1 μL in split mode, and inlet temperature was 250 °C. Nitrogen was used as the carrier gas (flow rate of 1.2 mL/min), and the oven temperature was programmed as follows: initial temperature 50 °C, held for 2 min; then increased by 10 °C/min to 280 °C, where it was held for 5 min. All data of FAMEs were recorded and quantitatively integrated using Chrom-Card data system (version 2.3, Thermo Fisher Scientific, Rodano-Milan, Italy) with an external standards calibration curve.
In addition to this, the individual peaks of FAMEs were also identified using Agilent gas chromatography and mass spectrometric detector (models 6890N GC and 5973 MSD, Agilent Technologies, Santa Clara, CA, USA) under the same chromatographic conditions. Scan acquisition (m/z 45-550) for MSD in the EI mode was carried out using HP Chemistation B.04.03 (Agilent Technologies, Santa Clara, CA, USA) and the NIST 17 Mass Spectral Library (Scientific Instrument Services, Ringoes, NJ, USA).

2.5. Statistics Software and Calculations

Statistical calculations and analysis were performed using Excel 2010 (Microsoft Corporation, Santa Rosa, CA, USA) and PASW Statistics 18.0.3.25 (International Business Machines Corporation, Armonk, NY, USA). The normalized and standardized data are re-calculated to a new data matrix. The discriminant analysis was carried in the direct mode, and all variables passing the tolerance criteria (0.001) were entered simultaneously with equal prior probabilities. The discriminant displays a max variance pattern (and structure) matrix without rotated transformation.

3. Results and Discussion

3.1. Fatty Acids Analysis by GC–FID

Fats and oils are important ingredients in many foods. Fat contributes to the texture, flavor, mouthfeel, and aroma of foods. The fatty acid composition was determined by the GC–FID method with a calibration curve after methyl esterification and extraction. All quantitative data are listed in Table 1.
Regression analysis is widely used to estimate the relationships among variables for prediction models in the field of machine learning [32]. The performance of regression analysis methods in practice depends on the form of the data generating process and on the probability distributions of the dependent variables around the prediction of the regression function.
The majority of the composition of coffee beans is contributed by the fatty acids C18:2 and C16:0, and the smaller parts (<1%) were accounted for by C20:1 and C22:0. While the absolute measurement uncertainties are a constant value, the fatty acids of the smaller parts would have greater relative uncertainty than would those occurring in larger proportion. For instance, the relative deviation (RSD) of fatty acid C20:1 is 13.7%, greater than the 0.126% of fatty acid C18:2, since the limit of quantitation (LOQ) is 50 ppm (0.05 mg/g).
As the featured variables, the distributions of fatty acids were compared to fit the normally distributed populations in Figure 3. However, highly symmetrical variances are not sensitive to the varieties of coffee beans. The dataset was not directly used as input variables for the classification algorithm. The similar distributions among these variables imply that the variances of fatty acids are constrained patterns within the dataset. This pattern may refer to the relationship of continuous variables, as opposed to the discrete variables used in classification.

3.2. Normalization (Percentile) and Standardization (Z-Score)

Many data processing techniques were utilized to reformat the data framework as normalized, including percentages, standardization (Z-score), logarithms, and inverse measured data. Generally, normalization removes the physical units of a measured dataset to make it a dimensionless dataset.
The fatty acids C18:0 and C18:2 are used to describe the structural characteristics of the system in Figure 4. The correlation with the original measured data has a strong linearity, which is 5 times the absolute quantities of fatty acids. The groups of roasted Arabica and green Robusta are at the ends of the line, and the other two groups are superposition in the middle zone of the line. The high correlation of two fatty acids implies the variables dependence in the quantitative data. Thus, the composition of fatty acids could be considered as an intensive property for individual specimens.
After normalization by percentiles, the percentile data shows a scattering pattern without an obvious correlation in six times the dimensional quantities of the fatty acids. The results indicate that the structures of the original dataset are ordered and become disrupted and more varied by normalization. Percentile normalization can enhance the variability of quantitative data, but it also amplifies the uncertainty (bias) to add on to the variances at the same time.

3.3. Discrimination Analysis

The raw data of pooled specimens were calculated using the linear classifiers (LCRG and LCAR) to obtain the scores dFRG and dFAR, respectively. Further, the discriminant scores were scattered into the groups (quadrants), as shown in Figure 5A. The target of classification was successfully achieved by the linear discriminant algorithm. The percentile data were also given scores by the linear classifiers, and the resulting scatter plot is shown in Figure 5B.
It is worth noting that there are five cases of error in the classification, which are noted in the confusion matrix in Table 2. The classifier of coffee species (LCAR) has perfect correctness, but the classifier of roasting degree (LCRG) only has 85% correctness. The classification errors in roasting degree occur in two ways: green mistaken as roasted or roasted mistaken as green. The LCRG has poorer discriminability than the LCAR in the training model; this is in conflict with the predictive model using human cognition.
In sensory testing, it is easier to differentiate the roasting degrees than to distinguish coffee species. Therefore, some information associated with the roasting categories must be attenuated in the percentile normalization. The dimensional reduction of the dataset matrix, which is rescaled as a reference standard, perhaps causes the information loss. For instance, the freedom of the eight fatty acids in percentage is 7 because the total composition must be 100%.
Discriminant analysis deals with the taxonomic classification (supervised learning) so that the cases are partitioned into the labeled groups. Partial least squares discriminant analysis has demonstrated great success in modelling high-dimensional datasets for versatility. Despite that, the user needs to optimize a wealth of parameters before reaching reliable and validated outcomes [26]. Unlike in principle component analysis and cluster analysis, the algorithms are used to explore unknown patterns in prior (unsupervised) learning.

3.4. Information Loss in Data Processing

For supervised learning, the training dataset was reviewed according to the distributions of labeled categories. We examined the differences in labeled categories for each classifier using Student’s t-test, as shown in Figure 6. Interesting, the Z-scored data differed significantly for discrimination of green and roasted coffees, which is the function of LCRG. However, the percentile data suppressed the significant difference between the Arabica and Robusta coffees, as shown in Figure 6A. Only the percentile data of fatty acids C20:0 and C22:0 have significance (t value > 2) at the 95% confidence level because the fatty acid C22:0 is the smaller part in the composition of fatty acids with average levels less than 1.0 %, as shown in Figure 3. The bias–variance tradeoff is a serious problem in this classification.
Otherwise, the Z-scored data have significance for the discrimination of Arabica and Robusta coffees, and the percentile data are enhanced in significance for fatty acids C18:1, C18:2, C18:3, C20:0, and C20:1 in Figure 6B. Thus, the percentile normalization is better suited to distinguishing the Arabica and Robusta coffees.
These results demonstrate that the discrimination of roasting degree is dominated by the extensive property of the raw data, and the discrimination of coffee species is dominated by the intensive property of the percentile data, as it is relative scale invariant. The majority of the lost information has an extensive property within the raw data. The percentile normalization reduces the dimensions of the data matrix and shrinks the contained information by erasing part of the extensive information.

3.5. Patching the Breach in the Classification System

The lost information has the extensive property and is erased in data processing. If related data with extensive property was spiked into the smaller normalized data pool, the discrimination could be enhanced, allowing higher correctness. As shown in our proposal in Figure 2, the crude fat content was used to patch the informational breach, forming a patching process for the classification system with normalized data.
In Figure 7A, the crude fat contents without normalization are distributed into the four labelled groups, and the confusion errors are shown in the grey area. The crude fat content with the extensive property of specimen information was not associated with the percentile fatty acids in order to avoid artificial containment derived from the normalization.
Obviously, the patched dataset can be well partitioned by the two classifiers; the results are shown in Figure 7B. The source of information loss is evidenced by the informational spiking with the extensive property of crude fat content (cFAT). These results demonstrate that the system performance of machine learning depends on the input informational integrity and type. Data processing perhaps enhances one system function, but suppresses another.

4. Conclusions

All kinds of data are used as a medium for transmitting information in modern life. Different professional explanations are often added in the processes of data transfer and expression. We have shown that if there is no mutual crossover between the two sets of data, the percentile process will be more effective for the classification of coffee beans. The source and the property of information loss in this classification were identified as the normalization processing and the extensive quantity. The loss of information is noted in the quantitative features of coffee beans that have gone through the roasting process. The performance of this coffee classification is enhanced and validated by our patching technique with the traceable informational processing. Furthermore, our results will promote correctness and avoid the bias–variance tradeoff in classification systems with multiple classifiers. For industrial applications, effects of different processing and materials could be associated with the food quality and consumers’ preference by the accurate discriminant exploring based on chemometric data.

Author Contributions

Conceptualization and methodology: P.C. and L.-Y.C.; software, validation, formal analysis, and investigation: Y.-C.H. and L.-Y.C.; writing—original draft preparation: Y.-C.H.; writing—review and editing: P.C. and L.-Y.C.; supervision and project administration: L.-Y.C.; funding acquisition: Y.-C.H. and L.-Y.C.

Funding

This research received no external funding.

Acknowledgments

The authors thank the King-Car Group Company, Taiwan, where Liang-Yü Chen was an employee, for their supports with experimental materials and equipment.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Romano, R.; Santini, A.; Le Grottaglie, L.; Manzo, N.; Visconti, A.; Ritieni, A. Identification markers based on fatty acid composition to differentiate between roasted Arabica and Canephora (Robusta) coffee varieties in mixtures. J. Food Compos. Anal. 2014, 35, 1–9. [Google Scholar] [CrossRef] [Green Version]
  2. Abad, M.; Abkar, A.; Mojaradi, B. Effect of the temporal gradient of vegetation indices on early-season wheat classification using the random forest classifier. Appl. Sci. 2018, 8, 1216. [Google Scholar] [CrossRef]
  3. Niimi, J.; Tomic, O.; Naes, T.; Jeffery, D.W.; Bastian, S.E.P.; Boss, P.K. Application of sequential and orthogonalised-partial least squares (SO-PLS) regression to predict sensory properties of Cabernet Sauvignon wines from grape chemical composition. Food Chem. 2018, 256, 195–202. [Google Scholar] [CrossRef] [PubMed]
  4. Pramudya, R.C.; Seo, H.S. Influences of product temperature on emotional responses to, and sensory attributes of, coffee and green tea beverages. Front. Psychol. 2017, 8, 2264. [Google Scholar] [CrossRef] [PubMed]
  5. Lange, C.; Combris, P.; Issanchou, S.; Schlich, P. Impact of information and in-home sensory exposure on liking and willingness to pay: The beginning of Fairtrade labeled coffee in France. Food Res. Int. 2015, 76, 317–324. [Google Scholar] [CrossRef] [PubMed]
  6. Lindinger, C.; Labbe, D.; Pollien, P.; Rytz, A.; Juillerat, M.A.; Yeretzian, C.; Blank, I. When machine tastes coffee: Instrumental approach to predict the sensory profile of espresso coffee. Anal. Chem. 2008, 80, 1574–1581. [Google Scholar] [CrossRef] [PubMed]
  7. Zia ur Rehman, M.; Gilani, S.; Waris, A.; Niazi, I.; Slabaugh, G.; Farina, D.; Kamavuako, E. Stacked Sparse Autoencoders for EMG-Based Classification of Hand Motions: A Comparative Multi Day Analyses between Surface and Intramuscular EMG. Appl. Sci. 2018, 8, 1126. [Google Scholar] [CrossRef]
  8. Masi, C.; Dinnella, C.; Barnaba, M.; Navarini, L.; Monteleone, E. Sensory properties of under-roasted coffee beverages. J. Food Sci. 2013, 78, S1290–S1300. [Google Scholar] [CrossRef] [PubMed]
  9. Spaniolas, S.; May, S.T.; Bennett, M.J.; Tucker, G.A. Authentication of coffee by means of PCR-RFLP analysis and lab-on-a-chip capillary electrophoresis. J. Agric. Food Chem. 2006, 54, 7466–7470. [Google Scholar] [CrossRef] [PubMed]
  10. Ribeiro, J.S.; Ferreira, M.M.; Salva, T.J. Chemometric models for the quantitative descriptive sensory analysis of Arabica coffee beverages using near infrared spectroscopy. Talanta 2011, 83, 1352–1358. [Google Scholar] [CrossRef] [PubMed]
  11. Dias, R.; Benassi, M. Discrimination between Arabica and Robusta Coffees Using Hydrosoluble Compounds: Is the Efficiency of the Parameters Dependent on the Roast Degree? Beverages 2015, 1, 127–139. [Google Scholar] [CrossRef] [Green Version]
  12. Dong, W.; Tan, L.; Zhao, J.; Hu, R.; Lu, M. Characterization of Fatty Acid, Amino Acid and Volatile Compound Compositions and Bioactive Components of Seven Coffee (Coffea robusta) Cultivars Grown in Hainan Province, China. Molecules 2015, 20, 16687–16708. [Google Scholar] [CrossRef] [PubMed]
  13. Villarreal, D.; Laffargue, A.; Posada, H.; Bertrand, B.; Lashermes, P.; Dussert, S. Genotypic and environmental effects on coffee (Coffea arabica L.) bean fatty acid profile: Impact on variety and origin chemometric determination. J. Agric. Food Chem. 2009, 57, 11321–11327. [Google Scholar] [CrossRef] [PubMed]
  14. Kalschne, D.L.; Viegas, M.C.; De Conti, A.J.; Corso, M.P.; Benassi, M.T. Steam pressure treatment of defective Coffea canephora beans improves the volatile profile and sensory acceptance of roasted coffee blends. Food Res. Int. 2018, 105, 393–402. [Google Scholar] [CrossRef] [PubMed]
  15. Marx, Í.; Rodrigues, N.; Dias, L.G.; Veloso, A.C.A.; Pereira, J.A.; Drunkler, D.A.; Peres, A.M. Sensory classification of table olives using an electronic tongue: Analysis of aqueous pastes and brines. Talanta 2017, 162, 98–106. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Rendon, M.Y.; de Jesus Garcia Salva, T.; Bragagnolo, N. Impact of chemical changes on the sensory characteristics of coffee beans during storage. Food Chem. 2014, 147, 279–286. [Google Scholar] [CrossRef] [PubMed]
  17. Bicho, N.C.; Leitao, A.E.; Ramalho, J.C.; de Alvarenga, N.B.; Lidon, F.C. Impact of roasting time on the sensory profile of arabica and robusta coffee. Ecol. Food Nutr. 2013, 52, 163–177. [Google Scholar] [CrossRef] [PubMed]
  18. Borras, E.; Ferre, J.; Boque, R.; Mestres, M.; Acena, L.; Calvo, A.; Busto, O. Prediction of olive oil sensory descriptors using instrumental data fusion and partial least squares (PLS) regression. Talanta 2016, 155, 116–123. [Google Scholar] [CrossRef] [PubMed]
  19. Shin, E.C.; Hwang, C.E.; Lee, B.W.; Kim, H.T.; Ko, J.M.; Baek, I.Y.; Lee, Y.B.; Choi, J.S.; Cho, E.J.; Seo, W.T.; et al. Chemometric Approach to Fatty Acid Profiles in Soybean Cultivars by Principal Component Analysis (PCA). Prev. Nutr. Food Sci. 2012, 17, 184–191. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  20. El-Abassy, R.M.; Donfack, P.; Materny, A. Discrimination between Arabica and Robusta green coffee using visible micro Raman spectroscopy and chemometric analysis. Food Chem. 2011, 126, 1443–1448. [Google Scholar] [CrossRef]
  21. Dong, W.; Hu, R.; Chu, Z.; Zhao, J.; Tan, L. Effect of different drying techniques on bioactive components, fatty acid composition, and volatile profile of robusta coffee beans. Food Chem. 2017, 234, 121–130. [Google Scholar] [CrossRef] [PubMed]
  22. Wei, F.; Furihata, K.; Miyakawa, T.; Tanokura, M. A pilot study of NMR-based sensory prediction of roasted coffee bean extracts. Food Chem. 2014, 152, 363–369. [Google Scholar] [CrossRef] [PubMed]
  23. Han, J.; Kim, M.K.; Lee, K.G. Furan Levels and Sensory Profiles of Commercial Coffee Products Under Various Handling Conditions. J. Food Sci. 2017, 82, 2759–2766. [Google Scholar] [CrossRef] [PubMed]
  24. Frank, O.; Blumberg, S.; Kunert, C.; Zehentbauer, G.; Hofmann, T. Structure determination and sensory analysis of bitter-tasting 4-vinylcatechol oligomers and their identification in roasted coffee by means of LC-MS/MS. J. Agric. Food Chem. 2007, 55, 1945–1954. [Google Scholar] [CrossRef] [PubMed]
  25. Martin, M.X.; Pablos, F.; Gonzalez, A.G.; Valdenebro, M.X.; Leon-Camacho, M. Fatty acid profiles as discriminant parameters for coffee varieties differentiation. Talanta 2001, 54, 291–297. [Google Scholar] [CrossRef]
  26. Lee, L.C.; Liong, C.Y.; Jemain, A.A. Partial least squares-discriminant analysis (PLS-DA) for classification of high-dimensional (HD) data: A review of contemporary practice strategies and knowledge gaps. Analyst 2018, 143, 3526–3539. [Google Scholar] [CrossRef] [PubMed]
  27. Corrales, D.; Ledezma, A.; Corrales, J. From Theory to Practice: A Data Quality Framework for Classification Tasks. Symmetry 2018, 10, 248. [Google Scholar] [CrossRef]
  28. Pacheco, F.; Rangel, C.; Aguilar, J.; Cerrada, M.; Altamiranda, J. Methodological framework for data processing based on the Data Science paradigm. In Proceedings of the 2014 XL Latin American Computing Conference (CLEI), Montevideo, Uruguay, 15–19 September 2014; pp. 1–12. [Google Scholar]
  29. Thiex, N.J.; Anderson, S.; Gildemeister, B. Crude fat, diethyl ether extraction, in feed, cereal grain, and forage (Randall/Soxtec/submersion method): Collaborative study. J. AOAC Int. 2003, 86, 888–898. [Google Scholar] [PubMed]
  30. Eder, K. Gas chromatographic analysis of fatty acid methyl esters. J. Chromatogr. B Biomed. Appl. 1995, 671, 113–131. [Google Scholar] [CrossRef]
  31. Carvalho, A.P.; Malcata, F.X. Preparation of fatty acid methyl esters for gas-chromatographic analysis of marine lipids: Insight studies. J. Agric. Food Chem. 2005, 53, 5049–5059. [Google Scholar] [CrossRef] [PubMed]
  32. Bishop, C.M. Pattern Recognition and Machine Learning (Information Science and Statistics); Springer-Verlag: Berlin, Germany, 2006. [Google Scholar]
Figure 1. Information conversion frames are associated with the real individuals, the chemometric data, and the sensory recognitions in this study.
Figure 1. Information conversion frames are associated with the real individuals, the chemometric data, and the sensory recognitions in this study.
Symmetry 10 00529 g001
Figure 2. Conceptual diagram showing the informational flows, leaks, and blocking in this 2D classification.
Figure 2. Conceptual diagram showing the informational flows, leaks, and blocking in this 2D classification.
Symmetry 10 00529 g002
Figure 3. The measured data of 34 coffee samples were pooled and profiled for the variances of cFAT and fatty acids, as unsupervised data. The boxes and lines present as the mean ±2s (standard deviations, 95%), the median, and the quartiles (Q1 and Q3) for specific variables after standardization. The numbers below (underlined) present the average composition for each fatty acid as a percentage of total free fatty acids (100%).
Figure 3. The measured data of 34 coffee samples were pooled and profiled for the variances of cFAT and fatty acids, as unsupervised data. The boxes and lines present as the mean ±2s (standard deviations, 95%), the median, and the quartiles (Q1 and Q3) for specific variables after standardization. The numbers below (underlined) present the average composition for each fatty acid as a percentage of total free fatty acids (100%).
Symmetry 10 00529 g003
Figure 4. A restructured correlation of fatty acids (C18:0 and C18:2) is presented with the normalized data in percentiles and juxtaposed with the correlation of fatty acids with the pooled measured data.
Figure 4. A restructured correlation of fatty acids (C18:0 and C18:2) is presented with the normalized data in percentiles and juxtaposed with the correlation of fatty acids with the pooled measured data.
Symmetry 10 00529 g004
Figure 5. Linear discriminant analysis plotted with the raw data (A) or the percentile data (%) (B) of the fatty acids in 34 coffee beans.
Figure 5. Linear discriminant analysis plotted with the raw data (A) or the percentile data (%) (B) of the fatty acids in 34 coffee beans.
Symmetry 10 00529 g005
Figure 6. Comparison of the effects of the Z-scored or percentile (%) data on the t values of fatty acids for discrimination of (A) Green and Roasted or (B) Arabica and Robusta coffees.
Figure 6. Comparison of the effects of the Z-scored or percentile (%) data on the t values of fatty acids for discrimination of (A) Green and Roasted or (B) Arabica and Robusta coffees.
Symmetry 10 00529 g006
Figure 7. Group distributions of the crude fat contents (A) in green and roasted coffee beans are compared using raw data. Further, linear discriminant analysis (B) is plotted with the percentile data of the fatty acids patched with the crude fat content (cFAT) of the 34 coffee bean samples.
Figure 7. Group distributions of the crude fat contents (A) in green and roasted coffee beans are compared using raw data. Further, linear discriminant analysis (B) is plotted with the percentile data of the fatty acids patched with the crude fat content (cFAT) of the 34 coffee bean samples.
Symmetry 10 00529 g007
Table 1. The measured data of the contents of crude fat (cFAT) and eight fatty acids (FAs) listed for 34 samples of coffee beans.
Table 1. The measured data of the contents of crude fat (cFAT) and eight fatty acids (FAs) listed for 34 samples of coffee beans.
IDGRARcFATC16:0C18:0C18:1C18:2C18:3C20:0C20:1C22:0
mg/g
01RoastedArabica15945.3428.98510.61062.4662.1093.6910.5080.714
02GreenArabica6720.7364.1184.79227.9420.9471.7970.2320.408
03RoastedArabica15537.9097.3019.24747.8711.6032.9520.4270.563
04GreenArabica8820.5124.0024.84825.2340.9111.9210.2450.399
05RoastedArabica15336.5077.24211.35547.9601.5522.7880.4140.556
06GreenArabica10225.6394.9847.58232.9421.1162.0750.2990.425
07RoastedArabica14836.1018.46110.06746.8571.5393.4360.4120.644
08GreenArabica9430.8976.9718.34139.4611.2572.9670.3450.675
09RoastedArabica20344.44010.41013.35559.3401.6834.2430.5540.905
10GreenArabica9825.4905.6506.50132.1361.0232.3660.2800.631
11RoastedArabica20354.85313.21917.27174.8992.4614.6910.6471.060
12GreenArabica8823.7575.6447.22431.5491.1132.1690.2880.636
13RoastedArabica17341.4139.77812.42155.0831.7773.4080.4510.749
14GreenArabica9822.4055.1846.24528.2091.0002.0120.2430.476
15RoastedArabica17550.1689.71212.82866.2302.0673.4390.5640.713
16RoastedArabica16649.2249.54812.64265.2962.1683.5790.5870.771
17GreenArabica12734.7476.6978.72045.0761.5242.3360.3830.471
18RoastedArabica15640.0167.9539.95553.1521.7273.0510.4110.606
19GreenArabica7919.0753.7794.78025.8780.8521.4660.1940.321
20RoastedArabica15049.1829.06612.60662.7212.2593.8070.5870.850
21RoastedArabica15649.5299.30312.61063.3252.3083.8900.5990.789
22GreenArabica7922.6224.1855.75128.7871.0781.7940.2780.368
23RoastedRobusta13533.0766.4398.84438.2280.6942.5100.3400.306
24RoastedRobusta11929.3085.9718.06834.8970.7562.5030.3530.316
25GreenRobusta5516.0413.2364.58418.8050.3911.4820.1880.271
26GreenRobusta5015.7293.1294.63418.5890.4361.5550.2050.239
27RoastedRobusta10023.9715.2728.63431.9990.6562.2470.3430.288
28GreenRobusta4910.4102.2483.59813.9470.3101.0260.1410.157
29RoastedRobusta12031.3046.86311.58840.8070.7963.1980.5000.484
30RoastedRobusta11530.9006.75311.28839.8560.7933.1110.4660.633
31GreenRobusta6116.7303.9435.93621.8000.4661.7920.2430.322
32GreenRobusta6015.4003.5985.47219.8440.4041.6200.2230.363
33RoastedRobusta9823.7865.2778.63431.3560.6652.2470.3330.269
34GreenRobusta5110.462.2483.59813.9580.3251.0210.1310.171
Table 2. Classification accuracy is assessed using a confusion matrix based on the discriminant functions with the normalized (percentile) data and classification into the four groups (2 × 2). The correctness is used to describe the performance of the individual classifiers (LCRG or LCAR).
Table 2. Classification accuracy is assessed using a confusion matrix based on the discriminant functions with the normalized (percentile) data and classification into the four groups (2 × 2). The correctness is used to describe the performance of the individual classifiers (LCRG or LCAR).
CategoriesGreenRoastedLCAR
Arabica8/1010/1222/22 (100)
Robusta6/65/612/12 (100)
LCRG14/16 (87.5)15/18 (83.3)Correct (%)

Share and Cite

MDPI and ACS Style

Hung, Y.-C.; Chen, P.; Chen, L.-Y. Advanced Classification of Coffee Beans with Fatty Acids Profiling to Block Information Loss. Symmetry 2018, 10, 529. https://doi.org/10.3390/sym10100529

AMA Style

Hung Y-C, Chen P, Chen L-Y. Advanced Classification of Coffee Beans with Fatty Acids Profiling to Block Information Loss. Symmetry. 2018; 10(10):529. https://doi.org/10.3390/sym10100529

Chicago/Turabian Style

Hung, Ying-Che, Ping Chen, and Liang-Yü Chen. 2018. "Advanced Classification of Coffee Beans with Fatty Acids Profiling to Block Information Loss" Symmetry 10, no. 10: 529. https://doi.org/10.3390/sym10100529

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop