1. Introduction
With substantially increasing prices of green coffee on the world market, simple and practicable strategies for product authentication are crucial aspects in assuring food quality as well as food safety. This is particularly important for “single-variety” products from
Coffea arabica (
C. arabica), as this still is the most popular coffee species. However, it is often blended with
Coffea canephora (
C. canephora) or other coffee species with lower prices. However, it is important to underline that it is not only the species that determines quality but also the origin and processing. Consequently,
Canephora-based coffees are not per se of lower value. A relatively new phenomenon is the growing interest in a third coffee species in the specialty coffee market:
Coffea liberica (
C. liberica) [
1]. While, for decades,
Liberica coffees were not in demand and the trees were historically primarily used as a trap crop for pest management, climate change has changed this view: due to its good resistance towards emerging climatic conditions, paired with its unique taste and limited availability, market prices for
C. liberica have overtaken the other two coffee species. Now, in turn,
C. liberica is a novel target for adulteration and mislabeling through blending with less expensive coffee species such as
C. arabica and
C. canephora. Overall, this underlines the clear need for routine-suitable methodologies for product authentication and quality analysis.
Differentiation of whole beans is typically described as straightforward, as the coffee beans of the different species have clearly defined morphological attributes. However, low-abundant adulterations are challenging to detect. An additional twist complicating authentication is the fact that morphologies differ within coffee species, such as the C. arabica varieties Catuai and Maragogype differing vastly in size. This underlines the need for authentication strategies on a deeper layer.
To date, different methods are described to detect adulteration of coffee. These include nuclear magnetic resonance (NMR) [
2,
3], high-performance liquid chromatography (HPLC) [
4,
5], near infrared (NIR) [
5,
6,
7], and Raman spectroscopic methods [
8,
9], as well as solid phase micro-extraction-gas chromatography-mass spectrometry (SPME-GC-MS) [
10,
11,
12]. Although NMR and SPME-GC-MS generate in-depth data and have the advantage of compound identification, these suffer from several limitations. NMR techniques require advanced sample preparations with a high rate of solvent consumption, while SPME-GC-MS methods are often time-consuming due to long extraction and desorption times. Similar to NMR, sample preparation for HPLC methods requires complex extraction methods with high solvent consumption before and during analysis while lacking the power of substance identification. Furthermore, the operation of these three analytical methods is confined to a highly specified laboratory environment and the availability of expensive gases with an increasing limitation of supplies such as helium. Fast spectroscopic methods like NIR and Raman are good screening tools with low infrastructural need, especially with the emerging trend of specified hand-held devices [
13]. However, the chemical information generated is limited and the identification of marker substances for validation is not possible. This leads to a rising need for a fast analytical method, operated at low infrastructure, that generates a sample fingerprint with identified marker compounds for validation.
The combination of the separation of gas chromatography paired with the fast and highly sensitive ion mobility spectrometry generates comprehensive volatile organic compound (VOC) 2D-fingerprints based on retention time and drift time of complex samples. GC-IMS is an emerging analytical method for the authentication of foods [
14,
15], beverages [
16,
17], or essential oils [
18,
19]. One major advantage of GC-IMS is the simplicity and robustness of the platform, which allows for use at the point-of-need (PoN)—a crucial aspect for fast authentication of suspect samples. These systems are operated at ambient pressure based on easily available and cheap nitrogen as carrier gas and feature a low power consumption pose, which drastically reduces the demand on laboratory infrastructure. As GC-IMS is typically based on soft ionization sources (e.g.,
3H, UV, or corona discharge (CD)), sensitivity for polar to medium polar species is excellent, which typically reduces the need for sample preparation, which again is an important factor for fast sample analysis at the PoN. This is also reflected by the substantially better scores in the context of green analytical chemistry (GAC) in comparison with, e.g., GC-MS [
20], as neither helium nor enrichment steps or solvents are required. The systems are typically benchtop-based. However, the latest developments in this field have continuously miniaturized IMS systems towards hand-held devices (e.g., [
21]).
While there are a number of GC-IMS-related publications in the field of coffee aroma [
22], geographic origin [
23,
24], and processing [
25,
26], the literature on species differentiation is surprisingly scarce. Konieczka et al. reported the application of GC-IMS for coffee species authentication [
27]. However, this approach was based on the sum spectra of the IMS data, disregarding the separation power of the GC. This second-dimension separation is the “power-up” for IMS systems and increases both selectivity and (useable) sensitivity substantially, opening up the path for omics-based approaches with chemometric data analysis.
Chemometric methods enhance the extraction of valuable information from the generated volatilomic fingerprints. Diverse fields of application in combination with different data analysis strategies have already been reported. Among the chemometric methods described are principal component analysis (PCA) [
28,
29], partial least squares-discriminant analysis (PLS-DA) [
30,
31], and different options of artificial neural networks [
32,
33]. These data analysis strategies are mainly maintained for the detection of geographical and botanical origin, as well as the detection of adulterants in the agricultural and food sector. Recently, the use of non-targeted volatilomics in combination with multivariate curve resolution-alternating least squares (MCR-ALS) was described for the authentication of saffron [
34]. This advanced chemometric tool enables the decomposition of highly complex spectrometric and chromatographic data matrices. The generated pure component and corresponding concentration profiles lead to the resolution of overlapping peaks and background contributions [
35]. In this context, MCR is used to generate extracted features that are utilized for sample discrimination with PCA and PLS-DA for authentication. Another advantage of MCR is the power to resolve the pure component spectra, which facilitates the potential to identify the underlying metabolites.
However, the identification of substances with GC-IMS data is challenging as commercial databases are not yet available and identification is commonly carried out via the analysis of reference substances. This method is costly and time-consuming, and prototype installations of IMS to GC-QMS systems for simultaneous detection of mass spectra as well as IMS spectra have already been reported for different applications [
14,
36,
37]. The profiling of the original dataspace coupled with the simultaneous mass spectrometry leads to a more accurate detection of influencing features and thus to a more accurate identification with the corresponding QMS data. Consequently, the application of GC-IMS in combination with advanced chemometric tools could offer a powerful, point-of-need-suitable approach for the differentiation of Coffea species.
The aim of this study was the development of fast, cost-efficient, and potentially point-of-need screening of coffee species with a minimal need for sample preparation. The focus was set on the identification of relevant signals in the IMS data by deconvolution with MCR-ALS. Parallel simultaneously generated MS data were used for a tentative identification of the assigned signals with commercially available databases. To the best of our knowledge, similar results for the differentiation of coffee species have not been published yet.
2. Materials and Methods
2.1. Reagents and Samples
Green and roasted coffee samples were provided by Coffee Consulate (Mannheim, Germany). The samples consisted of 30 green coffee samples and 30 roasted coffee samples including 8
C. arabica samples, 16
C. canephora samples, and 6
C. liberica samples, respectively. Apart from four
C. canephora samples, namely samples no. 21–24, all samples were produced as “specialty coffees”, i.e., no defective beans were used. The samples were stored in sealed bags protected from light at room temperature until analysis. The used sample sets with species, variety, geographical origins, and post-harvest processing is described in
Table 1.
For the analysis, 5 g of coffee beans per sample were shock-frozen with liquid nitrogen and ground for 45 s at level 8.5 using a kitchen grade knife mill (Thermomix TM6, Vorwerk Deutschland Stiftung & Co., KG, Wuppertal, Germany). Subsequently, 1.4 g per sample of the green coffee grounds and 0.4 g per sample of the roasted coffee grounds were transferred into 20 mL headspace vials and closed tightly with a screwcap with butyl/PTFE septa.
2.2. Instrumentation
All measurements were performed on a prototypic THS-GC-IMS-QMS dual detection system, consisting of a Shimadzu HS 20 headspace sampler (Shimadzu Corporation, Kyoto, Japan), a Shimadzu Nexus GC-2030 (Shimadzu Deutschland GmbH, Duisburg, Germany) coupled to a Shimadzu QP-2020 NX MSD (electron impact (EI) mode) (Shimadzu Deutschland GmbH, Duisburg, Germany) and a FOCUS-ion mobility spectrometer module (Gesellschaft für Analytische Sensorsysteme mbH, Dortmund, Germany). The optimal instrument parameters are summarized in
Table 2. Details on the hardware setup can be found here [
14].
Trapped headspace measurements were carried out with an incubation at 80 °C for 15 min and shaking level 1. A headspace volume of twice 1 mL was transferred onto a Tenax TA (Shimadzu Corporation, Kyoto, Japan) tube and trapped at −10 °C. The Tenax TA tube was equilibrated at 25 °C and desorbed onto the GC column at 280 °C with a split ratio of 1:20. Chromatographic separation was performed on a VF-23 ms capillary column (operating temperatures: 40–260 °C/260 °C; SN: NL10772427) with a 30 m × 0.25 mm × 0.25 µm film thickness (Agilent Technologies Inc., Santa Clara, CA, USA). The carrier gas was helium with a constant pressure of 180 kPa and a splitter advanced pressure controller (APC) pressure of 38 kPa. The GC oven program was as follows: 40 °C, held for one minute, 40 °C to 200 °C at 10 °C/min, held for 3 min, resulting in a run time of 20 min. At the end of the analytical column, the column gas flow was split by a SilFlow GC 4-port splitter plate (Trajan Scientific and Medical, Ringwood, Australia) into two retention gaps of 0.7 m length to the IMS and 1.6 m length to the QMS, with 0.15 mm inner diameter, respectively. Transfer lines were operated at 220 °C to both QMS (Shimadzu Corporation, Kyoto, Japan) and IMS (Hillesheim GmbH, Waghäusel, Germany). The ion source temperature of the QMS was set to 220 °C, the electron ionization energy was 70 eV, and the scan range was m/z 35 to m/z 500 with a duty cycle of 300 ms.
The OEM-Focus-IMS® cell consisted of a 3H-radioactive ionization source (100 MBq β-emission). It was operated at 100 °C in positive-ion mode at a constant voltage of 2.5 kV. The drift tube had a diameter of 15.2 mm and a length of 98 mm. The injection voltage was set to 2500 V, and the blocking voltage was set to 70 V. The drift gas was nitrogen of 99.9999% purity, controlled with a mass flow controller (Vögtlin Instruments GmbH, Aesch, Switzerland) to 150 mL/min. The injection pulse width was set to 100 µs and the sampling frequency was 228 kHz. To reduce data size, each spectrum was averaged over six scans with a repetition rate of 21 ms.
2.3. Data Processing and Evaluation
For data preprocessing, exploratory analysis and visualization of the IMS spectra, gc-ims-tools version 0.1.7 and Python version 3.11.4 were used [
38]. Preprocessing is an important step in multivariate data analysis to remove analysis-related effects, such as signal shifts. To reduce the size of the data for more efficient processing, the first step in the preprocessing was a level 3 wavelet compression to lower the number of variables without losing important information. The data were aligned alongside the drift time to correct for pressure-dependent shifts, and the drift time was normalized to the reactant ion peak (RIP). Further, the drift times were aligned by dynamic time warping (DTW). The spectra were cropped to the relevant areas corresponding to 1.05–2.1 on the RIP-relative drift time axis (7 ms to 15 ms drift time) and 50–900 s on the retention time axis. Afterwards, the data were baseline corrected by asymmetric least squares (AsLS) (weighting of 0.001, smoothing at 10
7). Finally, data were Pareto-scaled and mean-centered. The preprocessed data were analyzed in an exploratory data analysis by PCA and subsequently checked for outliers via Hotelling T
2-Q residuals plot. In a second step, supervised analysis by PLS-DA was performed.
For MCR-ALS, MATLAB version R2019b (MathWorks, Natick, MA, USA) in combination with the MCR-ALS 2.0 toolbox [
39] was used. The parameters used are described here [
34]. The GC-IMS data were RIP corrected and finally augmented column-wise with the retention times as rows and the drift times as columns. The augmented data were subjected to multivariate resolution by MCR-ALS. The number of components was determined by singular value decomposition (SVD) and the resulting scree plot as 50 for roast coffee and 55 for green coffee samples, respectively. Additionally, simple-to-use interactive self-modeling mixture analysis (SIMPLISMA) was used to calculate the initial estimate of IMS profiles as a starting point for ALS optimization. The constraints were spectral normalization, non-negativity in both RT and DT mode, and unimodality in RT mode. The PCA-based lack of fit (LOF) metric was used for MCR-ALS model evaluation. The threshold value as stop value was set to 0.1 and the iteration number was set to 1000. Convergence was achieved after 36 iterations with 2.40% LOF. The resolved elution profiles were subsequently used for discriminant analysis by PCA and PLS-DA, as well as compound identification in combination with the MS data.
The GC-MS data were analyzed with mzmine (mzio GmbH, Bremen, Germany). The data were analyzed as a batch. The steps of the batch queue were as follows: mass detection with a noise level set to 5.0 × 101; chromatogram builder; smoothing with the Loess smoothing algorithm; local minimum feature resolver; GC-EI spectral deconvolution with rt grouping and shape correlation algorithm; join aligner; spectral/molecular networking.
The assigned compounds were preliminarily annotated with NIST/EPA/NIH Mass Spectral Library 23 from the NIST (Gaithersburg, MD, USA) of the U.S. Department of Commerce. The TIC chromatograms were integrated and the retention time at peak maxima used for retention time comparison. For exploratory data analysis, the aligned feature list was exported and assessed with PCA and fold-change analysis in volcano plots with MetaboAnalyst 6.0 (NSERC, Ottawa, ON, Canada). To assess the need for data normalization prior to exploratory data analysis, the feature density and normalized intensity were evaluated. Based on the results, green coffee data were normalized by sum, log10-transformed, and auto-scaled, while roast coffee data were only auto-scaled prior to PCA analysis and fold-change analysis. For the generation of volcano plots, data were used unpaired and p was set to 0.05.
4. Conclusions
This study evaluated the potential of THS-GC-IMS paired with modern machine learning for the differentiation of coffee species and annotation of characteristic substances with the simultaneously measured QMS data. The classification without prior feature extraction was compared to MCR-ALS-resolved data with PLS-DA as the classification model. The feature extraction with MCR-ALS significantly enhanced classification accuracy and sensitivity for green coffee samples, which commonly have a less complex and highly similar volatile fingerprint throughout different species when compared to roast coffee. Furthermore, MCR-ALS preserves the chemical and physical information of the original data. This allows for precise determination of retention time and drift time of the characteristic signals in the IMS-spectra.
In order to annotate the IMS signals, the QMS data were first analyzed separately, and subsequently the retention times of the characteristic compounds were compared to the retention times of the characteristic IMS signals. Comparing the characteristic signals for each coffee species in IMS and QMS data, it was possible to tentatively annotate one characteristic compound for green coffee; however, for roast coffee samples, four common characteristic compounds were tentatively annotated in the QMS and IMS data. This study also made clear that there are still challenges to overcome, such as more precise alignment of the resulting data from the two very different detection systems, particularly for such highly complex samples. In a routine environment, the MS section of the system described here might not even be required, provided that the relevant features are identified or verified and databases are established. Commercially available GC-IMS applications could then be sufficient for such tasks. In conclusion, this study demonstrates the power of THS-GC-IMS in combination with MCR-ALS and PLS-DA for a fast authentication of coffee species, creating the potential for a fast point-of-care technique for the detection of fraudulent coffee blends.