Enhancing Coffee Quality and Traceability: Chemometric Modeling for Post-Harvest Processing Classification Using Near-Infrared Spectroscopy

Santos-Rivera, Mariana; Viswanathan, Lakshmanan; Sheibani, Faris

doi:10.3390/spectroscj3020020

Open AccessArticle

Enhancing Coffee Quality and Traceability: Chemometric Modeling for Post-Harvest Processing Classification Using Near-Infrared Spectroscopy

by

Mariana Santos-Rivera

^1,*

,

Lakshmanan Viswanathan

² and

Faris Sheibani

^1,2

¹

Smartspectra Limited, 109 Snakes Lane West, Woodford Green IG8 0DY, UK

²

Qima Coffee, 28 Maryland Road, London E15 1JB, UK

^*

Author to whom correspondence should be addressed.

Spectrosc. J. 2025, 3(2), 20; https://doi.org/10.3390/spectroscj3020020

Submission received: 31 March 2025 / Revised: 29 May 2025 / Accepted: 14 June 2025 / Published: 19 June 2025

(This article belongs to the Special Issue Feature Papers in Spectroscopy Journal)

Download

Browse Figures

Versions Notes

Abstract

Post-harvest processing (PHP) is a key determinant of coffee quality, flavor profile, and market classification, yet verifying PHP claims remains a significant challenge in the specialty coffee industry. This study introduces near-infrared spectroscopy (NIRS) coupled with chemometrics as a rapid, non-destructive approach to classify green coffee beans based on PHP. For the first time, seven distinct PHP categories—Alchemy, Anaerobic Processing (Deep Fermentation), Dry-Hulled, Honey, Natural, Washed, and Wet-Hulled—were discriminated using NIRS, encompassing 20 different processing protocols under varying environmental and fermentation conditions. The NIR spectra (350–2500 nm) of 524 green Arabica coffee samples were analyzed using PCA-LDA models (750–2450 nm), achieving classification accuracies up to 100% for underrepresented categories and strong performance (91–95%) for dominant PHP groups in an independent test set. These results demonstrate that NIRS can detect subtle chemical signatures associated with diverse PHP techniques, offering a scalable tool for quality assurance, fraud prevention, and traceability in global coffee supply chains. While limited sample sizes for some PHP categories may influence model generalization, this study lays the foundation for future work involving broader datasets and integration with digital traceability systems. The approach has direct implications for producers, traders, and certifying bodies seeking reliable, real-time PHP verification.

Keywords:

classification; coffee authenticity; food safety; green coffee beans; LDA; NIRS; PHP; product differentiation; quality control; verification

1. Introduction

Coffee is one of the most important and widely traded agricultural commodities in the world, with over 125 million people relying on its cultivation, processing, and trade for their livelihoods [1,2]. As a globally consumed beverage, its quality and commercial value are influenced by a range of factors, including genetics, cultivation conditions, and post-harvest practices [3,4,5,6]. Among these, post-harvest processing (PHP) plays a central role in shaping the chemical composition and sensory attributes of green coffee beans, directly impacting how coffee is classified, priced, and marketed [3,4,5,6]. In the context of rising demand for high-quality and traceable specialty coffees, PHP has emerged as a key differentiator along the supply chain. Accurate verification of PHP methods is therefore critical not only for quality control, but also for ensuring product authenticity, enabling market access, and strengthening supply chain transparency [7,8]. Recent innovations in PHP—such as controlled fermentation, extended drying protocols, and the use of microbial inoculants—have introduced new dimensions of flavor, enhancing the differentiation and value of specialty coffees. These techniques allow producers to craft unique sensory profiles, increasing their competitiveness in premium markets [3,4,5,6,9]. However, this diversification also presents challenges in ensuring product consistency, verifying processing claims, and safeguarding consumer safety. Inadequate control of fermentation parameters or drying conditions can result in undesirable microbial activity, leading to off-flavors or the formation of mycotoxins, which compromise both quality and safety [7,8,10,11]. Therefore, the ability to authenticate and classify post-harvest processes becomes essential not only for market differentiation, but also for maintaining supply chain transparency and consumer trust.

Traditional PHP methods can be broadly categorized into three major processing techniques: Washed, Natural, and Honey [3,4,5,6,12,13]. Each method involves distinct post-harvest steps that influence the chemical profile of the beans, imparting distinct sensory characteristics to the final coffee. The Washed process, widely used in Latin America and parts of Africa, involves depulping the cherries, followed by fermentation and thorough washing to remove mucilage before drying. This method produces clean cup profiles with high acidity and clarity, often reflecting the terroir of the origin [11,12,14,15]. In contrast, the Natural process, common in Ethiopia, Brazil, and parts of the Middle East, allows the whole cherry to dry intact [3,4,5,6]. The extended contact between the bean and fruit promotes the development of complex, fruity flavors and a heavier body, though it requires strict environmental control to avoid undesirable fermentation [11,12]. The Honey process, popular in Central America, is a hybrid approach where the mucilage is partially retained during drying [3,4,5,6]. Depending on how much mucilage is left (e.g., yellow, red, or black honey), this method can yield cup profiles that balance the bright acidity of Washed coffees with the sweetness and body of Naturals [14,15]. These distinctions not only influence flavor, but also shape consumer perception and market value, underscoring the importance of accurately verifying processing methods.

Beyond traditional methods, emerging PHP techniques such as Alchemy, Anaerobic Processing (Deep Fermentation), Dry-Hulled, and Wet-Hulled have introduced greater diversity in coffee production [6,16,17,18]. Alchemy, a proprietary method developed by Qima Coffee in Yemen, uses controlled fermentation supported by precise environmental modulation—including regulation of gas composition, pressure, and temperature—to influence microbial activity and enzymatic reactions. This results in highly structured cup profiles with enhanced aromatic clarity [19]. Anaerobic Processing, also pioneered by Qima Coffee and practiced in Yemen, extends fermentation duration under sealed conditions, intensifying acidity and fruit-forward flavors. Meanwhile, Dry-Hulled and Wet-Hulled methods, typical in Indonesia, differ in moisture management: Dry-Hulled coffee retains its parchment during drying, whereas Wet-Hulled beans are hulled while still at high moisture levels (~30–35%), which accelerates drying but leads to distinct earthy and full-bodied cup characteristics [20,21]. As producers increasingly adopt these diverse techniques to enhance cup quality and market appeal, the ability to authenticate processing claims becomes critical for ensuring transparency and consistency across international supply chains. Reliable verification methods are essential not only for quality control and price differentiation, but also to protect consumer trust and uphold regulatory standards.

Ensuring traceability in coffee PHP is essential for multiple stakeholders, from farmers to exporters and regulatory agencies. Accurate classification of PHP methods supports quality control, enables premium pricing strategies, and mitigates fraud in coffee trading and food safety [7,22]. For example, incorrectly labeling a washed coffee, typically associated with cleaner flavor profiles and lower fermentation-driven complexity, as “natural” or “honey” processed can lead to unwarranted price inflation and mislead consumers and buyers seeking distinctive flavor characteristics. This type of fraudulent labeling not only distorts market pricing, but also erodes consumer trust and undermines the efforts of producers who follow authentic, often labor-intensive, processing protocols [7,8]. Conventional methods for verifying PHP techniques, such as sensory evaluation and wet chemical analysis, have limitations. Green coffee authentication has been explored through a wide array of analytical approaches, including chromatographic techniques, electronic sensing systems, vibrational spectroscopy, and metabolomics, which have provided valuable insights into origin, quality, and processing-related differences [7,8,9,23,24,25]. Sensory evaluation is inherently subjective, requiring trained cuppers and standardized protocols, while wet chemical techniques are labor-intensive, time-consuming, and often destructive [7,8,22,23]. Given these challenges, rapid, non-destructive analytical techniques are needed to authenticate PHP methods efficiently [7,8].

Near-Infrared Spectroscopy (NIRS) has gained recognition as a valuable analytical tool for food authentication and quality assessment due to its ability to provide rapid, non-invasive, and high-throughput analysis [26,27,28,29]. NIRS measures molecular overtones and combination vibrations in the near-infrared region (750–2500 nm), capturing chemical signatures associated with moisture, lipids, proteins, carbohydrates, and other compounds [26,27,28,29,30]. Coupled with chemometric techniques such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), NIRS can classify complex agricultural products based on their spectral fingerprints [31,32,33]. Recent advancements in machine learning and chemometric modeling have enhanced the predictive capabilities of NIRS for food authentication [31,32]. Multivariate statistical techniques, including PCA for feature extraction and LDA for classification, allow for robust discrimination of sample groups based on spectral differences [31,32]. These features make it particularly suitable for differentiating products like green coffee beans, where conventional methods may fall short.

Previous studies have demonstrated that integrating NIRS and LDA achieves high classification accuracy for coffee authentication [22,34,35,36,37]; however, most studies have focused on broad classifications, primarily distinguishing between Washed and Natural processes [23]. For instance, research has demonstrated the feasibility of NIRS combined with electronic tongue analysis to discriminate between washed Arabica, natural Arabica, and Robusta coffee samples [23]. However, no study has systematically employed NIRS to differentiate the seven PHP techniques assessed in this research. As processing innovations become increasingly complex, there is a clear need to evaluate the effectiveness of NIRS-based approaches for discriminating among PHP methods with high accuracy and reliability. Despite its advantages, the broader adoption of NIRS in producing regions may be limited by barriers such as high initial equipment costs, restricted access to portable devices, and insufficient technical training [28]. Overcoming these challenges is essential to enabling decentralized, data-driven quality control and authentication, particularly in smallholder contexts, and ensuring that NIRS can fulfill its promise as a transformative tool for enhancing transparency and consistency in the global coffee supply chain.

This study addresses the research gap by evaluating the potential of NIRS, coupled with chemometrics, to classify and authenticate PHP methods in green coffee beans. By analyzing NIR spectra from 524 whole green coffee samples processed using seven distinct methods—Alchemy, Anaerobic Processing, Dry-Hulled, Honey, Natural, Washed, and Wet-Hulled—this research assesses the efficacy of PCA-LDA models for PHP classification. Two models were developed: one trained on the entire dataset and another incorporating an independent test set, excluding underrepresented PHP methods. The evaluation of the models was based on accuracy, sensitivity, and specificity, offering insights into the robustness and generalization of the approach. In contrast to previous studies, which often focused on origin, this work highlights the chemical distinctiveness introduced by processing methods and validates a practical framework for detecting them using rapid, non-destructive techniques. These findings have direct implications for real-world implementation, enabling exporters, quality control labs, and certification bodies to authenticate PHP claims efficiently, support traceability, reduce fraud, and ultimately improve price transparency and market access for smallholder producers.

2. Materials and Methods

2.1. Green Coffee Samples

A total of 524 whole green coffee samples were analyzed in this study, representing 11 origins across major coffee-producing regions (Table 1). The samples were sourced between 2022 and 2024 by Qima Coffee (London, United Kingdom) from high-quality lots that participated in prestigious competitions such as The Best of Yemen auction organized by Qima Coffee in London, and The Cup of Excellence competitions (Houston, Texas, USA) carried out in Brazil, Colombia, Ecuador, El Salvador, Guatemala, Indonesia, Indonesia, México, Nicaragua, Perú, Thailand; or were commercial-grade Arabica coffees (Cup score: 82.6–91.8) confirmed for accurate post-harvest processing (PHP) labeling. This diversity ensured global representativeness in terms of geography, farm practices, and quality grading. The coffees encompassed a wide range of genetic varieties, such as Bourbon, Castillo, Catimor, Caturra, Geisha, Maragogype, Maracaturra, Obata, Pacamara, SL-28, SL-34, Sudan Rume, Typica, Yemenia, and others, representing both specialty and widely cultivated lines. PHP methods initially included over 20 producer- or country-specific designations (alchemy, anaerobic, anaerobic fermentation honey, anaerobic natural, carbonic maceration honey, carbonic maceration natural, deep fermentation, dry-hulled, honey, honey anaerobic, honey double fermentation, lactic natural, natural, pulped natural, red washed, semi-washed, slow dried natural, washed, washed anaerobic, wet-hulled), which were consolidated into seven analytically distinct PHP categories: Alchemy, Anaerobic Processing (Deep Fermentation), Dry-Hulled, Honey, Natural, Washed, and Wet-Hulled. This grouping allowed for standardization across terminology and improved spectral comparison by focusing on chemical distinctions driven by fermentation intensity, mucilage retention, and drying techniques. The sample (100 g) moisture content ranged between 9.8% and 10.5%, aligning with commercial standards and minimizing variability in water-related absorption bands. Although this narrow range reduces spectral distortion, minor effects were addressed through preprocessing (e.g., SNV and second derivative correction). Table 1 also illustrates uneven sample representation across PHP categories. These differences were accounted for in model development by excluding underrepresented classes in the second PCA-LDA model. Samples were vacuum-packed in high-barrier packaging materials to preserve integrity during transport. Then, they were shipped to London, UK, where they underwent NIR spectral collection and further analysis.

2.2. NIR Spectra Acquisition

The NIR spectra of 524 whole green coffee beans were collected using a portable ASD LabSpec^®4 spectrophotometer with Indico^®Pro software v.6.5.6.1 (Malvern Panalytical, Analytical Spectral Devices Inc., Boulder, CO, USA). For each sample, five independent spectral signatures were recorded at room temperature (20–22 °C) in diffuse reflectance mode, requiring no sample preparation using the Muglight accessory. Approximately 30 g of whole green coffee beans were placed in the sample cup, and multiple beans were scanned simultaneously, allowing for a representative bulk measurement of each sample. This method minimizes spectral variability caused by individual kernel differences and ensures consistency in spectral acquisition. Spectral acquisition was performed over the 350–2500 nm wavelength range, with a resolution of 1.4 nm for the 350–1000 nm region and 2 nm for the 1000–2500 nm region. Each spectrum was generated by averaging 50 scans with an integration time of 34 ms per scan. Instrumental reference standards were used for background calibration to ensure spectral accuracy and consistency.

2.3. Chemometrics

The raw reflectance spectra (350–2500 nm) were extracted and inspected for outliers. Then, the absorbance was transformed using Unscrambler^® X v.10.4 software (CAMO Analytics, Oslo, Norway), and the mathematical pre-treatments of Baseline Correction, Standard Normal Variate (SNV), De-trending, and 2nd derivative (polynomial order: 2, Savitzky–Golay smoothing points: 24). Before classification, spectral preprocessing was applied to minimize artifacts and enhance relevant chemical features. SNV and de-trending were used to correct for scatter effects and baseline variability that commonly result from variations in sample morphology, including grain size and surface characteristics. Additionally, a second derivative transformation was applied to improve spectral resolution by removing overlapping signals and accentuating minor absorbance features. These preprocessing steps are widely adopted in NIR chemometrics for improving classification accuracy and model robustness by emphasizing chemically informative regions while suppressing noise and irrelevant variance [26,28].

The five spectral signatures per sample were averaged; this approach minimizes variability caused by heterogeneity in bean size, surface texture, and packing orientation during scanning, while preserving consistent spectral patterns representative of the chemical composition of the sample. Averaging enhances the signal-to-noise ratio and has been widely adopted in NIRS workflows to ensure robust feature extraction for chemometric modeling. Spectral preprocessing was performed within the 750–2450 nm range. This range was selected after evaluating multiple intervals, as it consistently yielded the highest classification performance. This region captures overtone and combination bands of O–H, C–H, and N–H functional groups—associated with moisture, proteins, carbohydrates, and other matrix components affected by post-harvest processing—while excluding noisy regions with low signal-to-informative ratio (e.g., UV-Vis region < 750 nm and >2450 nm edge effects). This range, therefore, provides a chemically relevant and analytically stable basis for PHP discrimination.

To capture the most relevant spectral variation while minimizing dimensional complexity, key features were extracted through Principal Component Analysis (PCA), implemented via singular value decomposition (SVD), and validated using a full random cross-validation strategy. PCA minimizes redundancy and enhances class separability by concentrating processing-induced chemical variations in the leading components. However, despite the effectiveness of PCA, some overlap between PHP categories—particularly those with shared fermentation stages or mucilage retention levels (e.g., Honey and Natural)—may persist due to similar spectral absorption features. These limitations are intrinsic to the spectral similarity of biologically-related matrices and warrant further use of higher-order classification algorithms such as Linear Discriminant Analysis (LDA). PCA-LDA with the Mahalanobis distance method (JMP^® 17.0, SAS Institute Inc., Cary, NC, USA) using PCs explaining >99.5% of the variance was selected as the classification framework for this study due to its proven effectiveness in handling high-dimensional spectral data and its interpretability in chemometric applications [36,38,39]. Two PCA-LDA models were developed to assess classification performance. The initial model was trained on the entire dataset to discriminate between seven PHP methods: Alchemy, Anaerobic Processing, Dry-Hulled, Honey, Natural, Washed, and Wet-Hulled. The second model, incorporating an independent test set, excluded PHP methods with limited sample representation to ensure proper stratification, model generalization, and to avoid training/testing bias caused by underrepresented classes. This approach mirrors real-world applications, where robust classification tools are most valuable for frequently encountered processing methods and highlights the importance of continued data collection for underrepresented categories. Training and test sets were established using stratified random sampling with a 70/30% distribution, as shown in Table 1.

The classification performance of the PCA-LDA models was assessed using R Studio (R version 4.4.1, “Race for Your Life”, The R Foundation for Statistical Computing), employing the MASS library using accuracy, sensitivity, and specificity as evaluation metrics [31,32,40]. Accuracy measured the correctly classified proportion of samples relative to the total number of samples. Sensitivity evaluates the correctly classified positive instances of each PHP method, calculated as TP/(TP + FN), where TP represents true positives and FN false negatives. Sensitivity is particularly critical in PHP verification as it ensures the ability of the model to correctly identify samples that truly belong to a given processing category, thereby avoiding false negatives that could compromise traceability. Specificity assessed the ability of the model to correctly identify negative instances, computed as TN/(TN + FP), where TN denotes true negatives and FP false positives [41]. Specificity ensures the model does not wrongly assign a sample to a category it does not belong to—an essential feature to prevent market fraud or mislabeling. These metrics provide complementary insights into the robustness of the model and are especially important in the context of supply chain authentication, where both false inclusions and exclusions can have commercial and reputational consequences. For the model incorporating a training and test set, performance was evaluated separately on both subsets to assess generalization. Metrics were computed both per category and overall. Per-category calculations evaluated the classification performance of each PHP method individually, while overall accuracy, sensitivity, and specificity were calculated across all categories by summing the TP, TN, FP, and FN values. For the second model, which included a training and test set, performance was evaluated separately on both subsets to assess generalization. These metrics comprehensively assessed the PCA-LDA ability to discriminate PHP methods based on NIR spectral data, ensuring reliable classification for coffee authentication and quality control [40]. Figures were created using Microsoft Excel (Microsoft^® Office Professional Plus™ 2021, Microsoft Corporation^® Redmond, WA, USA) and JMP^® 17.0 (SAS Institute Inc., Cary, NC, USA).

3. Results

3.1. Spectral Characteristics of Green Coffee Beans

In the observed mean raw NIR spectra of whole green coffee beans from different PHP methods shown in Figure 1a, distinct absorbance peaks were identified at 920, 1000, 1200, 1470, 1700, 1760, 1900, 2130, and 2400 nm [36,37]. These peaks correspond to fundamental vibrational modes of specific chemical bonds present in the coffee matrix. The absorbance features at 920, 1000, and 1200 nm are primarily associated with O–H stretching vibrations, while those at 1470, 1700, and 1760 nm are attributed to C–H stretching vibrations. Peaks observed in the 1900 to 2400 nm range result from a combination of O–H, C–H, and N–H vibrations, indicating the presence of multiple chemical constituents in green coffee. Notably, differences in absorbance patterns among PHP methods become more evident between 1400 and 2400 nm, a spectral region where key coffee components, including caffeine (C₈H₁₀N₄O₂), trigonelline (C₇H₇NO₂), chlorogenic acid (C₁₆H₁₈O₉), proteins, amino acids (RCH(NH₂)COOH), lipids (CH₃(CH₂)_nCOOH), carbohydrates (C_x(H₂O)_γ), sucrose (C₁₂H₂₂O₁₁), and water (H₂O), exhibit characteristic spectral responses [30,37]. The fermentation intensity, mucilage retention, and drying kinetics inherent to each PHP method can modulate these molecular groups, resulting in distinct spectral patterns. For example, methods such as Alchemy and Anaerobic Processing, which involve extended and controlled microbial activity, may alter polysaccharide degradation and amino acid formation differently than faster-drying methods like Natural or Dry-Hulled. While a full compound-level interpretation was beyond the scope of this study’s design, previous literature has confirmed correlations between post-harvest fermentation intensity and changes in protein and phenolic compound absorbance in the NIR region [4,12,16]. Further enhancement of these vibrational modes was achieved through second derivative transformation combined with Savitzky–Golay smoothing, as shown in Figure 1b. This mathematical pre-treatment increased spectral resolution and emphasized key absorption features. However, in this instance, no clear separation between the PHP methods was visually evident in the pre-processed spectra, reinforcing the need for supervised pattern recognition techniques to uncover underlying spectral differences.

The exploratory analysis using PCA revealed chemical similarities among the seven post-harvest processing (PHP) methods for whole green coffee beans (Figure 1c). The first two Principal Components (PCs) explained 50% and 20% of the total spectral variance, respectively, capturing the main spectral variations related to PHP. The two-dimensional PCA scores plot shows that while different PHP methods exhibit overlapping spectral characteristics, distinct clustering trends are observed within the quadrants. When labeling the quadrants starting from the upper right as Quadrant I, then moving counterclockwise, Alchemy, Washed, and Wet-Hulled samples predominantly cluster in Quadrants I and II, whereas Anaerobic Processing, Dry-Hulled, Honey, and Natural coffees are more concentrated in Quadrants III and IV. This distribution suggests that processing-induced chemical modifications influence spectral variance. However, the degree of spectral overlap between some methods, particularly Natural and Honey, indicates potential misclassification risks due to their similar fermentation and drying parameters. These spectral similarities underscore the importance of robust preprocessing and dimensionality reduction in enhancing class separation and maintaining classification accuracy [12,29].

The PCA loadings plot shown in Figure 1d provides insights into the dominant wavelengths responsible for these clustering tendencies. The primary peaks in PC-1 were observed at 926, 955, 984, 1130, 1158, 1214, 1240, 1310, 1367, 1410, 1455, 1660, 1728, 1868, 1934, 2020, 2093, 2224, and 2310 nm, accounting for 50% of the variance. These peaks correspond to O–H stretching vibrations (e.g., 926, 955, 984 nm), C–H stretching vibrations (e.g., 1130, 1158, 1240 nm), and N–H bending vibrations (e.g., 1367, 1410 nm), which are associated with water, carbohydrates, proteins, and lipids—key components affected by PHP methods [26,28]. The PC-2 loadings, which explained 20% of the variance, exhibited peaks at 929, 946, 1129, 1164, 1185, 1207, 1240, 1304, 1376, 1407, 1448, 1541, 1588, 1655, 1720, 1780, 1832, 1905, 2039, 2115, 2181, and 2285 nm. These peaks correspond to a combination of O–H, C–H, and N–H vibrational modes, reflecting secondary differences in molecular interactions influenced by processing conditions [26,28]. These wavelengths are not only critical for differentiating PHP techniques, but also correspond to key quality indicators in coffee, such as moisture content, sugar degradation products, lipid oxidation, and protein structure—factors that influence aroma, flavor, and stability [4,11,12]. Their sensitivity to biochemical transformations occurring during fermentation and drying makes them particularly suitable for spectral authentication models aimed at verifying processing methods and ensuring product integrity across supply chains [37]. This reinforces their utility in quality assurance, traceability, and fraud prevention initiatives within the specialty coffee industry. The Hotelling’s T² influence analysis was used to identify potential multivariate outliers based on their leverage in PCA space, which could artificially distort clustering trends or bias the classification model. The analysis confirmed the absence of such outliers, indicating that all samples contributed consistently to the spectral variance structure. As no samples were removed, the classification performance was unaffected, preserving the integrity and generalizability of the model.

3.2. Coffee Post-Harvest Processing Discrimination

Two PCA-LDA models were developed to assess the classification performance of NIRS in distinguishing PHP methods in whole green coffee beans. This two-model approach was designed to balance comprehensive classification with realistic generalization. By comparing a model trained on the full dataset against one excluding underrepresented categories, the classification reliability was evaluated under both ideal and practical field conditions, supporting real-world applicability in PHP verification. The initial model, trained on the entire dataset, successfully classified seven PHP methods—Alchemy, Anaerobic Processing, Dry Hulled, Honey, Natural, Washed, and Wet-Hulled—achieving an overall accuracy of 96.5%, with 98.3% sensitivity and 99.7% specificity (Table 2). These high values indicate a strong ability to correctly identify PHP methods while minimizing false positives. The LDA scores plot (Figure 2a) illustrates distinct clustering patterns, where each PHP method is well-separated, confirming the robustness of the model in differentiating post-harvest techniques based on spectral signatures. Analyzing the individual classification performance per PHP method, Alchemy and Anaerobic Processing exhibited the highest classification confidence, achieving over 99% sensitivity and specificity, reflecting their highly distinct spectral profiles. Wet-Hulled and Dry-Hulled also demonstrated strong classification performance, with sensitivity values exceeding 97%, highlighting their unique processing characteristics that result in clearly distinguishable spectral patterns. Washed coffee samples were correctly classified with 96.8% sensitivity and 98.5% specificity, showing a consistent spectral fingerprint. Honey and Natural processing methods displayed the highest misclassification rates, though they still maintained above 94% sensitivity and specificity. This classification overlap is expected, as both techniques retain significant amounts of mucilage during drying, resulting in similar chemical compositions and, consequently, spectral similarities [4,12]. These overlapping profiles reduce the discriminative power of linear classifiers in these cases. Nonetheless, the high accuracy, sensitivity, and specificity across the full model, especially in more chemically distinct processes like Alchemy and Washed, reinforce the robustness of the NIRS-based approach for practical PHP verification in commercial settings. Anaerobic Processing, Dry-Hulled, and Wet-Hulled achieved 100% accuracy, sensitivity, and specificity in the PCA-LDA model trained on the entire dataset (Table 2). It is important to note that these categories were represented by a limited number of samples. As such, their high classification performance should be interpreted with caution, as the lower variability in these small sample sets may have inflated model metrics. To address this, a second model was constructed using only the four most represented PHP categories—Alchemy, Honey, Natural, and Washed—and included an independent test set to evaluate model generalization.

The second PCA-LDA excluded PHP methods with limited sample representation. Throughout the training phase, the model attained an accuracy of 94.7%, with a sensitivity of 97.1% and a specificity of 98.9%, confirming its high discriminatory power. When tested on the independent dataset, the model demonstrated a high accuracy of 91.4%, demonstrating its robustness across the most frequently used PHP techniques (Table 3). The LDA scores plot for this model (Figure 2b) reveals a similar clustering pattern, with some overlap between Honey and Natural processing methods, consistent with their shared mucilage retention characteristics. The confusion matrices for both models (Table 2 and Table 3) provide further insight into classification precision. Alchemy, Anaerobic Processing, Wet-Hulled, and Dry-Hulled exhibited nearly perfect classification results, with misclassification rates below 1%. Washed coffees showed a high degree of separation, with only occasional misclassification with Honey samples. The highest misclassification rates in the test set were observed between Honey and Natural (5.2%) and between Washed and Honey (4.7%), likely due to their spectral similarities resulting from varying degrees of mucilage retention. These processing methods share overlapping chemical compositions, leading to slight classification ambiguities. However, despite these minor misclassifications, the model maintained a high overall accuracy of 91.4%, reinforcing the reliability of NIRS combined with LDA as a robust, non-destructive tool for PHP verification. These high-confidence classifications are particularly valuable for exporters, certifiers, and quality control laboratories, where reliable and rapid verification of PHP claims supports market differentiation and prevents fraud in premium coffee segments.

4. Discussion

The ability to accurately verify and classify post-harvest processing (PHP) methods in coffee is essential for ensuring supply chain transparency, quality control, food safety, and product differentiation, particularly as specialty coffee markets continue to expand [3,4,7,20,42]. The findings of this study demonstrate that NIRS, combined with chemometrics, provides a rapid, non-destructive, and highly accurate method for differentiating PHP techniques in green coffee beans. The classification of seven distinct PHP categories—Alchemy, Anaerobic Processing, Dry-Hulled, Honey, Natural, Washed, and Wet-Hulled—achieved high accuracy, sensitivity, and specificity, highlighting the effectiveness of this analytical approach [31,32,41]. These results significantly advanced coffee authentication, with broad implications for producers, exporters, traders, and regulatory agencies involved in the global coffee supply chain. Furthermore, by enabling the reliable verification of processing methods, the approach addresses major industry challenges related to fraudulent labeling, inconsistent quality, and a lack of traceability. NIRS-based classification models offered a scalable solution for certification bodies and regulators seeking to enforce standards, support origin claims, and enhance transparency across diverse production and market contexts.

One of the most critical contributions of this study was its impact on smallholder coffee farmers, particularly those producing high-quality specialty coffees. Smallholder farmers often face significant challenges in maintaining quality consistency, differentiating their coffees in the market, and ensuring fair pricing [20,43,44]. The ability to authenticate and classify PHP methods using NIRS can empower producers by providing scientifically validated evidence of processing techniques, enabling them to secure premium pricing, access specialty markets, and protect against fraudulent labeling [7,44]. Furthermore, supply chain actors, including exporters, roasters, and certifying bodies, can leverage this technology to validate processing claims efficiently, reducing dependence on subjective sensory evaluation and time-consuming chemical analyses [4,7,12,37]. While scalability across diverse farming operations may be constrained by access to instrumentation, infrastructure, and training, the increasing availability of portable NIRS devices and cloud-based platforms offers viable pathways for broader adoption. By supporting PHP verification, NIRS strengthens transparency and traceability in trade negotiations, facilitates compliance with buyer specifications, and reinforces consumer trust in product origin and quality. This scalable and cost-effective authentication approach aligns with sustainability initiatives and advances equitable participation in premium markets for both farmers and consumers.

This study represents the first systematic attempt to classify a wide array of post-harvest processing (PHP) methods in green coffee using Near-Infrared Spectroscopy (NIRS) combined with chemometrics, offering a substantial advancement in coffee authentication research. While previous studies applying NIRS to green coffee have largely focused on broader classifications—such as differentiating Washed vs. Natural processes—or emphasized factors like geographic origin and genetic variety [22,34,36,37], this work uniquely consolidates over 20 PHP designations into seven analytically meaningful groups. These groups were based on similarities in fermentation mechanisms, mucilage retention, and drying dynamics, allowing the model to capture consistent chemical distinctions while minimizing intra-class variability. This categorization strategy improved classification accuracy by enhancing spectral coherence and reducing ambiguity in label definitions [23,45,46]. The ability of NIRS to discriminate among these categories, even under diverse environmental conditions and processing protocols, confirms that each PHP method imparts a distinct and quantifiable chemical signature to green coffee [5,6,10,11,47,48]. These findings further validate that post-harvest techniques influence not only sensory attributes, but also the underlying molecular composition of coffee, particularly in compounds such as chlorogenic acids, carbohydrates, lipids, and proteins [4,6,11,47,49,50]. This standardized classification framework thus supports scalable authentication practices across varied production contexts.

The ability of NIRS to classify PHP methods also underscores its advantages over traditional authentication techniques. Sensory evaluation, while widely used, remained subjective and inconsistent, influenced by factors such as the experience of the cupper, environmental conditions, and variability in roasting and brewing protocols. These inconsistencies introduced bias and limited reproducibility, particularly when verifying PHP claims across different batches and origins [7,10]. Wet chemical methods, such as high-performance liquid chromatography (HPLC) and mass spectrometry, provided detailed chemical profiles but were labor-intensive, expensive, and destructive, and they did not classify the samples into PHP methods [6,17,24]. In contrast, NIRS offered a rapid, non-invasive alternative capable of analyzing large sample sets within minutes, while maintaining high reproducibility and requiring minimal sample preparation [26,27,28,31,32]. These advantages positioned NIRS as a valuable tool for quality control laboratories, exporters, and specialty coffee buyers seeking efficient and reliable PHP authentication. However, potential challenges to broader adoption remained, such as variability in instrument calibration across platforms, lack of standardized spectral databases, and the need for technical training. Addressing these barriers would be essential to facilitate consistent implementation in coffee certification systems worldwide.

Despite its strengths, several challenges and limitations must be considered when applying NIRS for PHP verification in real-world scenarios. One of the primary challenges is the need for extensive calibration datasets to ensure robust and generalizable classification models across a wide range of post-harvest processing conditions, coffee origins, and production environments [29,31,32,40]. Given the dynamic nature of coffee production, factors such as climate variability, microbial diversity, and farm-level processing deviations can introduce variability in spectral responses [6,9,46,51]. Increasing the dataset to include underrepresented PHP methods, seasonal batches, and a more diverse range of coffee origins and genetic varieties will be critical for enhancing model adaptability and resilience to unseen data [40,52]. Additionally, while the present study achieved high classification accuracy, the misclassification rates between Honey and Natural processing methods underscore the difficulty in distinguishing categories with overlapping chemical signatures. These ambiguities underscore the need for further refinement of spectral preprocessing and feature extraction techniques to enhance discrimination, particularly when compositional similarities arise from shared mucilage retention or comparable chemical profiles.

Another important consideration is the accessibility of NIRS technology for smallholder farmers and cooperatives. While portable NIRS devices are increasingly available, initial investment costs and technical expertise requirements may pose barriers to widespread adoption [32,33]. Addressing these barriers may involve introducing affordable handheld spectrometers, offering capacity-building programs for local stakeholders, and creating shared diagnostic centers within producer cooperatives. Additionally, user-friendly and cost-effective NIRS solutions—integrated with cloud-based spectral databases and mobile applications—could facilitate broader implementation across different segments of the coffee supply chain [32,33,52]. Integrating NIRS classification with blockchain or digital traceability platforms could further enhance transparency, enabling real-time verification of processing claims and reinforcing trust between producers and buyers [1,7].

Beyond PHP authentication, the potential applications of NIRS in coffee science extend beyond classification. Future studies could explore the correlation between spectral features and specific sensory attributes, allowing for a predictive model of flavor development based on processing conditions [3,5,10,53]. Incorporating machine learning techniques, such as random forests, support vector machines, or deep learning architectures, can improve model accuracy by capturing intricate non-linear patterns linking spectral data to sensory attributes. Additionally, combining NIRS with gas chromatography-mass spectrometry (GC-MS) or liquid chromatography-mass spectrometry (LC-MS) could provide deeper insights into the metabolic transformations occurring during fermentation and drying, particularly in relation to key aroma and flavor compounds [10,17,24,49,53]. Such interdisciplinary approaches would offer a more comprehensive understanding of post-harvest biochemistry, bridging the gap between processing practices, chemical composition, and final cup quality.

This study establishes NIRS as a powerful analytical tool for PHP verification, demonstrating its ability to discriminate among diverse processing methods despite variations in environmental conditions, fermentation protocols, and incubation times. The findings provide a scientific foundation for standardizing processing claims, supporting quality control measures, and enabling data-driven decision-making in coffee authentication [3,7,42]. By enhancing traceability, preventing fraud, ensuring food safety, and differentiating the market, NIRS-based PHP classification represents a transformative advancement not only for the specialty coffee industry, but also for broader commercial markets. This approach can support regulatory compliance, streamline certification processes, and improve consumer confidence across global supply chains.

5. Conclusions

This study demonstrated the efficacy of NIRS combined with chemometric modeling for the classification of post-harvest processing (PHP) methods in green coffee beans. By systematically grouping over 20 distinct processing protocols into seven major PHP categories, this research established a novel approach for differentiating coffee processing techniques using non-destructive spectral analysis. Despite the inherent variability in processing conditions, including fermentation protocols, incubation times, and environmental factors, the classification models achieved high accuracy, sensitivity, and specificity, confirming the robustness of NIRS in detecting subtle chemical variations across PHP methods. Beyond the specialty coffee sector, these findings have broader implications for the global coffee industry, offering a scalable tool to authenticate PHP methods, support trade negotiations, and ensure regulatory compliance. While the technology shows strong potential, real-world adoption may be constrained by accessibility barriers for smallholder farmers, including high initial costs and limited technical capacity. Addressing these challenges through user-friendly, low-cost NIRS solutions and digital infrastructure will be crucial for the equitable implementation of these solutions. Future research should explicitly align with industry needs by focusing on expanding calibration datasets to include underrepresented PHP techniques and origins, optimizing spectral preprocessing and feature extraction methods, and integrating NIRS classification outputs with digital traceability systems such as blockchain. Validating this approach across multiple origins and processing environments will be critical for industry-wide adoption and the advancement of precision coffee authentication.

Author Contributions

Conceptualization, M.S.-R. and F.S.; methodology, M.S.-R.; software, M.S.-R.; validation, M.S.-R.; formal analysis, M.S.-R.; investigation, M.S.-R., L.V. and F.S.; resources, F.S.; data curation, M.S.-R. and L.V.; writing—original draft preparation, M.S.-R.; writing—review and editing, M.S.-R., L.V. and F.S.; visualization, M.S.-R.; supervision, M.S.-R. and F.S.; project administration, M.S.-R.; funding acquisition, F.S. All authors have read and agreed to the published version of the manuscript.

Funding

This project was supported by Qima Coffee’s R&D funds.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available upon request from Mariana Santos-Rivera (mariana@smartspectra.ai).

Acknowledgments

The authors thank Adeel Qureshi, Manal Al-Sakkaf, Ahmed Mahyoub, José Giraldo, and the procurement team from Qima Coffee for sourcing the green coffee beans. Special thanks to all the coffee farmers who produced the high-quality lots in this study and participated in prestigious auctions, including Best of Yemen and Cup of Excellence competitions between 2022 and 2024.

Conflicts of Interest

The authors declare no conflicts of interest. However, for transparency, we disclose that Mariana Santos-Rivera is an employee in Smartspectra Limited; Lakshmanan Viswanathan is an employee in Qima Coffee; Faris Sheibani is an employee in Qima Coffee and Smartspectra.

References

Krishnan, S. Sustainable Coffee Production. In Oxford Research Encyclopedia of Environmental Science; Oxford University Press: Oxford, UK, 2017; ISBN 978-0-19-938941-4. [Google Scholar]
Farah, A. (Ed.) Coffee: Production, Quality and Chemistry; Royal Society of Chemistry: Cambridge, UK, 2019; ISBN 978-1-78262-004-4. [Google Scholar]
Girma, B.; Sualeh, A. A Review of Coffee Processing Methods and Their Influence on Aroma. Int. J. Food Eng. Technol. 2022, 6, 7. [Google Scholar] [CrossRef]
Munyendo, L.M.; Njoroge, D.M.; Owaga, E.E.; Mugendi, B. Coffee Phytochemicals and Post-Harvest Handling—A Complex and Delicate Balance. J. Food Compos. Anal. 2021, 102, 103995. [Google Scholar] [CrossRef]
Banti, M.; Abraham, E. Coffee Processing Methods, Coffee Quality and Related Environmental Issues. J. Food Nutr. Sci. 2021, 9, 144. [Google Scholar] [CrossRef]
De Melo Pereira, G.V.; De Carvalho Neto, D.P.; Magalhães Júnior, A.I.; Vásquez, Z.S.; Medeiros, A.B.P.; Vandenberghe, L.P.S.; Soccol, C.R. Exploring the Impacts of Postharvest Processing on the Aroma Formation of Coffee Beans—A Review. Food Chem. 2019, 272, 441–452. [Google Scholar] [CrossRef]
Perez, M.; Domínguez-López, I.; López-Yerena, A.; Vallverdú Queralt, A. Current Strategies to Guarantee the Authenticity of Coffee. Crit. Rev. Food Sci. Nutr. 2023, 63, 539–554. [Google Scholar] [CrossRef]
Ferreira, T.; Galluzzi, L.; De Paulis, T.; Farah, A. Three Centuries on the Science of Coffee Authenticity Control. Food Res. Int. 2021, 149, 110690. [Google Scholar] [CrossRef]
De Bruyn, F.; Zhang, S.J.; Pothakos, V.; Torres, J.; Lambot, C.; Moroni, A.V.; Callanan, M.; Sybesma, W.; Weckx, S.; De Vuyst, L. Exploring the Impacts of Postharvest Processing on the Microbiota and Metabolite Profiles during Green Coffee Bean Production. Appl. Environ. Microbiol. 2017, 83, e02398-16. [Google Scholar] [CrossRef]
Cao, X.; Wu, H.; Viejo, C.G.; Dunshea, F.R.; Suleria, H.A.R. Effects of Postharvest Processing on Aroma Formation in Roasted Coffee—A Review. Int. J. Food Sci. Technol. 2023, 58, 1007–1027. [Google Scholar] [CrossRef]
Sorane Good Kitzberger, C.; Pot, D.; Marraccini, P.; Filipe Protasio Pereira, L.; Brígida Dos Santos Scholz, M. Flavor Precursors and Sensory Attributes of Coffee Submitted to Different Post-Harvest Processing. AIMS Agric. Food 2020, 5, 700–714. [Google Scholar] [CrossRef]
Rodriguez, Y.F.B.; Guzman, N.G.; Hernandez, J.G. Effect of the Postharvest Processing Method on the Biochemical Composition and Sensory Analysis of Arabica Coffee. Eng. Agríc. 2020, 40, 177–183. [Google Scholar] [CrossRef]
Bekti Sunarharum, W.; Setyo Yuwono, S.; Sofia Murtini, E.; Fibrianto, K.; Waziiroh, E.; Shinta Wulandari, E.; Yum Wahibah, L. Introduction Of Post-Harvest Processing Tehcnology To Increase Production Capacity And Quality Of Coffee Processed By Uph Sekar Rindu, Dampit, Malang Regency. J. Innov. Appl. Technol. 2018, 4, 766–770. [Google Scholar] [CrossRef]
Sunarharum, W.B.; Yuwono, S.S.; Nadhiroh, H. Effect of Different Post-Harvest Processing on the Sensory Profile of Java Arabica Coffee. Adv. Food Sci. Sustain. Agric. Agroind. Eng. 2018, 1, 9–14. [Google Scholar] [CrossRef]
Widodo, P.B.; Yulianto, M.E.; Ariyanto, H.D.; Paramita, V. Efficacy of Natural and Full Washed Post-Harvest Processing Variations on Arabica Coffee Characteristics. Mater. Today Proc. 2023, 87, 79–85. [Google Scholar] [CrossRef]
Cortés-Macías, E.T.; López, C.F.; Gentile, P.; Girón-Hernández, J.; López, A.F. Impact of Post-Harvest Treatments on Physicochemical and Sensory Characteristics of Coffee Beans in Huila, Colombia. Postharvest Biol. Technol. 2022, 187, 111852. [Google Scholar] [CrossRef]
Da Silva, M.C.S.; Da Luz, J.M.R.; Veloso, T.G.R.; Gomes, W.D.S.; Oliveira, E.C.D.S.; Anastácio, L.M.; Cunha Neto, A.; Moreli, A.P.; Guarçoni, R.C.; Kasuya, M.C.M.; et al. Processing Techniques and Microbial Fermentation on Microbial Profile and Chemical and Sensory Quality of the Coffee Beverage. Eur. Food Res. Technol. 2022, 248, 1499–1512. [Google Scholar] [CrossRef]
Effendi, M.; Faqy, M.M.; Santoso, I.; Astuti, R.; Mahmudy, W.F. Identification of Arabica Coffee Post-Harvest Processing Using a Convolutional Neural Network. BIO Web Conf. 2024, 90, 03003. [Google Scholar] [CrossRef]
Qima, C. ALCHEMY—Qima|The Coffee Revolution. Available online: https://www.qimacoffee.com/alchemy (accessed on 11 February 2025).
Mawardi, I.-; Nurdin, N.; Zulkarnaini, Z. Appropriate Technology Program of Postharvested Coffee: Production, Marketing, and Coffee Processing Machine Business Unit. J. Pengabdi. Kpd. Masy. Indones. J. Community Engagem. 2019, 5, 267. [Google Scholar] [CrossRef]
Analianasari, A.; Berliana, D.; Shintawati, S. Defects of Coffee Beans with Different Postharvest Processes and Roasting Temperatures on Volatile Compounds of Coffee Beans from Coffee Small-Scale Industries of West Lampung Indonesia. Trends Sci. 2024, 21, 7695. [Google Scholar] [CrossRef]
Giraudo, A.; Grassi, S.; Savorani, F.; Gavoci, G.; Casiraghi, E.; Geobaldo, F. Determination of the Geographical Origin of Green Coffee Beans Using NIR Spectroscopy and Multivariate Data Analysis. Food Control 2019, 99, 137–145. [Google Scholar] [CrossRef]
Buratti, S.; Sinelli, N.; Bertone, E.; Venturello, A.; Casiraghi, E.; Geobaldo, F. Discrimination between Washed Arabica, Natural Arabica and Robusta Coffees by Using near Infrared Spectroscopy, Electronic Nose and Electronic Tongue Analysis. J. Sci. Food Agric. 2015, 95, 2192–2200. [Google Scholar] [CrossRef]
Yulianti, Y.; Adawiyah, D.R.; Herawati, D.; Indrasti, D.; Andarwulan, N. Detection of Markers in Green Beans and Roasted Beans of Kalosi-Enrekang Arabica Coffee with Different Postharvest Processing Using LC-MS/MS. Int. J. Food Sci. 2023, 2023, 6696808. [Google Scholar] [CrossRef]
Kim, J.-S.; Choi, J.; Park, S.-E.; Kwak, S.; Cho, H.; Son, H.-S. Factors Influencing Metabolite Profiles in Global Arabica Green Coffee Beans: Impact of Continent, Altitude, Post-Harvest Processing, and Variety. Food Res. Int. 2025, 208, 116187. [Google Scholar] [CrossRef]
Pasquini, C. Near Infrared Spectroscopy: A Mature Analytical Technique with New Perspectives—A Review. Anal. Chim. Acta 2018, 1026, 8–36. [Google Scholar] [CrossRef] [PubMed]
Manley, M. Near-Infrared Spectroscopy and Hyperspectral Imaging: Non-Destructive Analysis of Biological Materials. Chem. Soc. Rev. 2014, 43, 8200–8214. [Google Scholar] [CrossRef]
Williams, P.; Antoniszyn, J.; Manley, M. Near Infrared Technology: Getting the Best out of Light, 1st ed.; African Sun Media: Stellenbosch, South Africa, 2019. [Google Scholar]
Cozzolino, D. The Sample, the Spectra and the Maths—The Critical Pillars in the Development of Robust and Sound Applications of Vibrational Spectroscopy. Molecules 2020, 25, 3674. [Google Scholar] [CrossRef]
Barbin, D.F.; Felicio, A.L.d.S.M.; Sun, D.-W.; Nixdorf, S.L.; Hirooka, E.Y. Application of Infrared Spectral Techniques on Quality and Compositional Attributes of Coffee: An Overview. Food Res. Int. 2014, 61, 23–32. [Google Scholar] [CrossRef]
Zhang, W.; Kasun, L.C.; Wang, Q.J.; Zheng, Y.; Lin, Z. A Review of Machine Learning for Near-Infrared Spectroscopy. Sensors 2022, 22, 9764. [Google Scholar] [CrossRef] [PubMed]
Zeng, J.; Guo, Y.; Han, Y.; Li, Z.; Yang, Z.; Chai, Q.; Wang, W.; Zhang, Y.; Fu, C. A Review of the Discriminant Analysis Methods for Food Quality Based on Near-Infrared Spectroscopy and Pattern Recognition. Molecules 2021, 26, 749. [Google Scholar] [CrossRef] [PubMed]
Kumaravelu, C.; Gopal, A. A Review on the Applications of Near-Infrared Spectrometer and Chemometrics for the Agro-Food Processing Industries. In Proceedings of the 2015 IEEE Technological Innovation in ICT for Agriculture and Rural Development (TIAR), Chennai, India, 10–12 July 2015; pp. 8–12. [Google Scholar]
Nguyen Minh, Q.; Lai, Q.D.; Nguy Minh, H.; Tran Kieu, M.T.; Lam Gia, N.; Le, U.; Hang, M.P.; Nguyen, H.D.; Chau, T.D.A.; Doan, N.T.T. Authenticity Green Coffee Bean Species and Geographical Origin Using Near-infrared Spectroscopy Combined with Chemometrics. Int. J. Food Sci. Technol. 2022, 57, 4507–4517. [Google Scholar] [CrossRef]
Okubo, N.; Kurata, Y. Nondestructive Classification Analysis of Green Coffee Beans by Using Near-Infrared Spectroscopy. Foods 2019, 8, 82. [Google Scholar] [CrossRef]
Santos-Rivera, M.; Montagnon, C.; Sheibani, F. Identifying the Origin of Yemeni Green Coffee Beans Using near Infrared Spectroscopy: A Promising Tool for Traceability and Sustainability. Sci. Rep. 2024, 14, 13342. [Google Scholar] [CrossRef] [PubMed]
Munyendo, L.; Njoroge, D.; Hitzmann, B. The Potential of Spectroscopic Techniques in Coffee Analysis—A Review. Processes 2021, 10, 71. [Google Scholar] [CrossRef]
Villegas, A.M.; Perez, C.; Arana, V.A.; Sandoval, T.; Posada, H.E.; Garrido, A.; Ginel, J.G.; Perez, D.; Olmo, J.G. Identificación de origen y calibración para tres compuestos químicos en café, por espectroscopia de infrarojo cercano. Cenicafe 2014, 65, 7–16. [Google Scholar]
Santos-Rivera, M. Novel Strategies in near Infrared Spectroscopy (NIRS) and Multivariate Analysis (MVA) for Detecting and Profiling Pathogens and Diseases of Agricultural Importance. Ph.D. thesis, Mississippi State University, Mississippi State, MS, USA, 2022.
Qu, L.; Pei, Y. A Comprehensive Review on Discriminant Analysis for Addressing Challenges of Class-Level Limitations, Small Sample Size, and Robustness. Processes 2024, 12, 1382. [Google Scholar] [CrossRef]
Parikh, R.; Mathai, A.; Parikh, S.; Sekhar, G.C.; Thomas, R. Understanding and Using Sensitivity, Specificity and Predictive Values. Indian. J. Ophthalmol. 2008, 56, 45–50. [Google Scholar] [CrossRef]
Rotta, N.M.; Curry, S.; Han, J.; Reconco, R.; Spang, E.; Ristenpart, W.; Donis-González, I.R. A Comprehensive Analysis of Operations and Mass Flows in Postharvest Processing of Washed Coffee. Resour. Conserv. Recycl. 2021, 170, 105554. [Google Scholar] [CrossRef]
Kidist, T.; Zerihun, G.; Biniam, E. Assessment of Pre and Post-Harvest Management Practices on Coffee (Coffea arabica L.) Quality Determining Factors in Gedeo Zone, Southern Ethiopia. Afr. J. Agric. Res. 2019, 14, 1216–1228. [Google Scholar] [CrossRef]
Harahap, R.H.; Absah, Y.; Aulia, F. Improving Arabica Coffee Quality: Examining Post-Harvest Practices among Farmers in Bener Meriah District, Aceh, Indonesia. Preprints 2024, 2024060938. [Google Scholar]
Worku, M.; de Meulenaer, B.; Duchateau, L.; Boeckx, P. Effect of Altitude on Biochemical Composition and Quality of Green Arabica Coffee Beans Can Be Affected by Shade and Postharvest Processing Method. Food Res. Int. 2018, 105, 278–285. [Google Scholar] [CrossRef]
Worku, M.; Astatkie, T.; Boeckx, P. Shade and Postharvest Processing Effects on Arabica Coffee Quality and Biochemical Composition in Lowland and Midland Coffee-Growing Areas of Southwestern Ethiopia. J. Food Compos. Anal. 2023, 115, 105027. [Google Scholar] [CrossRef]
Tieghi, H.; Pereira, L.D.A.; Viana, G.S.; Katchborian-Neto, A.; Santana, D.B.; Mincato, R.L.; Dias, D.F.; Chagas-Paula, D.A.; Soares, M.G.; De Araújo, W.G.; et al. Effects of Geographical Origin and Post-Harvesting Processing on the Bioactive Compounds and Sensory Quality of Brazilian Specialty Coffee Beans. Food Res. Int. 2024, 186, 114346. [Google Scholar] [CrossRef] [PubMed]
Lamessa Tesgera, K.; Nandeshwar, B.C.; Jalata, Z.; Chala, T.C. Physical Quality of Coffee Bean (Coffea arabica L.) as Affected by Harvesting and Drying Methods. J. Hortic. Sci. 2021, 16, 292–300. [Google Scholar] [CrossRef]
Aung Moon, S.; Wongsakul, S.; Kitazawa, H.; Saengrayap, R. Influence of Post-Harvest Processing and Drying Techniques on Physicochemical Properties of Thai Arabica Coffee. AgriEngineering 2024, 6, 2198–2213. [Google Scholar] [CrossRef]
Velásquez, S.; Banchón, C. Influence of Pre-and Post-Harvest Factors on the Organoleptic and Physicochemical Quality of Coffee: A Short Review. J. Food Sci. Technol. 2023, 60, 2526–2538. [Google Scholar] [CrossRef] [PubMed]
Haile, M.; Hee Kang, W. The Harvest and Post-Harvest Management Practices’ Impact on Coffee Quality. In Coffee—Production and Research; Toledo Castanheira, D., Ed.; IntechOpen: London, UK, 2020; ISBN 978-1-83880-884-6. [Google Scholar]
Motta, I.V.C.; Vuillerme, N.; Pham, H.-H.; De Figueiredo, F.A.P. Machine Learning Techniques for Coffee Classification: A Comprehensive Review of Scientific Research. Artif. Intell. Rev. 2024, 58, 15. [Google Scholar] [CrossRef]
Halagarda, M.; Obrok, P. Influence of Post-Harvest Processing on Functional Properties of Coffee (Coffea arabica L.). Molecules 2023, 28, 7386. [Google Scholar] [CrossRef]

Figure 1. NIR spectral characteristics in whole green coffee beans (750–2450 nm) acquired from samples processed using seven different post-harvest processing (PHP) methods (n = 524). (a) Raw averaged NIR spectral signatures with Baseline Correction show the characteristic spectral pattern of green coffee across different PHP methods. (b) Transformed spectra reveal enhanced vibrational features but no clear separation among PHP methods. (c) PCA scores plot displaying sample distribution based on the first two principal components (PC-1 = 50%, PC-2 = 20%), highlighting clustering trends by the PHP method. (d) PCA loadings show dominant wavelengths that influence the PCA scores plot.

Figure 2. LDA scores plots for the training sets of both classification models. (a) LDA scores plot for the model trained on the entire dataset; distinct clusters indicate strong separation between processing methods. (b) LDA scores plot for the model incorporating an independent test set; while clustering patterns remain consistent, slight overlaps are observed among some PHP methods, particularly those with similar mucilage retention levels.

Table 1. Distribution of green Arabica coffee samples by origin and post-harvest processing method.

Country	% Moisture Average	Coffee Post-Harvest Processing (PHP) ¹
Country	% Moisture Average	A	AP	DH	H	N	W	WH	Total
Brazil	9.8	0	0	0	0	20	6	0	26
Colombia	10.4	0	0	0	7	9	34	0	50
Ecuador	10.4	0	0	0	2	6	36	0	44
El Salvador	10.2	0	0	0	5	17	6	0	28
Guatemala	10.2	0	0	0	1	7	20	0	28
Indonesia	10.1	0	0	3	6	22	5	3	39
Mexico	10.0	0	0	0	2	11	11	0	24
Nicaragua	10.1	0	0	0	1	6	9	0	16
Peru	9.9	0	0	0	0	11	60	0	71
Thailand	10.5	0	0	0	10	24	8	0	42
Yemen	10.4	22	4	0	6	124	0	0	156
Total Samples		22	4	3	40	257	195	3	524
Training Set Spectra		75	20	15	140	900	680	15	1845
Test Set Spectra		35	NA	NA	60	385	295	NA	775
Total Spectra		110	20	15	200	1285	975	15	2620

¹ Coffee Post-Harvest Processing (PHP): A = Alchemy; AP = Anaerobic Processing; DH = Dry-Hulled; H = Honey; N = Natural; W = Washed; WH = Wet-Hulled.

Table 2. Classification and performance parameters PCA-LDA 1 trained on the entire dataset.

Coffee Post-Harvest Processing (PHP) ¹	PCA-LDA 1 (750–2500 nm) ²
Coffee Post-Harvest Processing (PHP) ¹	A	AP	DH	H	N	W	WH	Total	Ac	Se	Sp
Alchemy	21	0	0	0	1	0	0	22	99.1	95.5	99.2
Anaerobic Processing	0	4	0	0	0	0	0	4	100	100	100
Dry-Hulled	0	0	3	0	0	0	0	3	100	100	100
Honey	0	0	0	39	0	1	0	40	99.8	97.5	100
Natural	4	0	0	6	244	3	0	257	98.1	97.6	98.6
Washed	0	0	0	3	1	191	0	195	99.2	99.5	99.1
Wet-Hulled	0	0	0	0	0	0	3	3	100	100	100
Overall Accuracy (%)	96.5
Overall Sensitivity (%)	98.3
Overall Specificity (%)	99.7

¹ Coffee Post-Harvest Processing (PHP): A = Alchemy; AP = Anaerobic Processing; DH = Dry-Hulled; H = Honey; N = Natural; W = Washed; WH = Wet-Hulled. ² PCA-LDA performance metrics: Ac = Accuracy; Se = Sensitivity; Sp = Specificity.

Table 3. Classification and performance parameters PCA-LDA 2 incorporating an independent test set.

PCA-LDA 2 (750–2500 nm) ¹
Training Set
Coffee Post-Harvest Processing (PHP) ²	A	H	N	W	Total	Ac	Se	Sp
Alchemy	15	0	0	0	15	100	100	100
Honey	0	28	0	0	28	98.3	100	98.2
Natural	0	6	173	1	180	97.8	96.1	99.4
Washed	0	0	1	135	136	99.4	99.3	99.6
Overall Accuracy (%)	97.8
Overall Sensitivity (%)	97.8
Overall Specificity (%)	99.3
Test Set
Coffee Post-Harvest Processing (PHP)	A	H	N	W	Total	Ac	Se	Sp
Alchemy	4	0	3	0	7	96.8	57.1	98.6
Honey	0	7	0	5	12	90.3	58.3	93.0
Natural	2	9	62	4	77	86.1	80.5	91.4
Washed	0	1	4	54	59	93.0	91.5	94.0
Overall Accuracy (%)	91.5
Overall Sensitivity (%)	81.9
Overall Specificity (%)	94.7

¹ PCA-LDA performance metrics: Ac = Accuracy; Se = Sensitivity; Sp = Specificity. ² Coffee Post-Harvest Processing (PHP): A = Alchemy; H = Honey; N = Natural; W = Washed.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Santos-Rivera, M.; Viswanathan, L.; Sheibani, F. Enhancing Coffee Quality and Traceability: Chemometric Modeling for Post-Harvest Processing Classification Using Near-Infrared Spectroscopy. Spectrosc. J. 2025, 3, 20. https://doi.org/10.3390/spectroscj3020020

AMA Style

Santos-Rivera M, Viswanathan L, Sheibani F. Enhancing Coffee Quality and Traceability: Chemometric Modeling for Post-Harvest Processing Classification Using Near-Infrared Spectroscopy. Spectroscopy Journal. 2025; 3(2):20. https://doi.org/10.3390/spectroscj3020020

Chicago/Turabian Style

Santos-Rivera, Mariana, Lakshmanan Viswanathan, and Faris Sheibani. 2025. "Enhancing Coffee Quality and Traceability: Chemometric Modeling for Post-Harvest Processing Classification Using Near-Infrared Spectroscopy" Spectroscopy Journal 3, no. 2: 20. https://doi.org/10.3390/spectroscj3020020

APA Style

Santos-Rivera, M., Viswanathan, L., & Sheibani, F. (2025). Enhancing Coffee Quality and Traceability: Chemometric Modeling for Post-Harvest Processing Classification Using Near-Infrared Spectroscopy. Spectroscopy Journal, 3(2), 20. https://doi.org/10.3390/spectroscj3020020

Article Menu

Enhancing Coffee Quality and Traceability: Chemometric Modeling for Post-Harvest Processing Classification Using Near-Infrared Spectroscopy

Abstract

1. Introduction

2. Materials and Methods

2.1. Green Coffee Samples

2.2. NIR Spectra Acquisition

2.3. Chemometrics

3. Results

3.1. Spectral Characteristics of Green Coffee Beans

3.2. Coffee Post-Harvest Processing Discrimination

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI