Characterization of Cabernet Sauvignon Wines by Untargeted HS-SPME GC-QTOF-MS

Untargeted metabolomics approaches are emerging as powerful tools for the quality evaluation and authenticity of food and beverages and have been applied to wine science. However, most fail to report the method validation, quality assurance and/or quality control applied, as well as the assessment through the metabolomics-methodology pipeline. Knowledge of Mexican viticulture, enology and wine science remains scarce, thus untargeted metabolomics approaches arise as a suitable tool. The aim of this study is to validate an untargeted HS-SPME-GC-qTOF/MS method, with attention to data processing to characterize Cabernet Sauvignon wines from two vineyards and two vintages. Validation parameters for targeted methods are applied in conjunction with the development of a recursive analysis of data. The combination of some parameters for targeted studies (repeatability and reproducibility < 20% RSD; linearity > 0.99; retention-time reproducibility < 0.5% RSD; match-identification factor < 2.0% RSD) with recursive analysis of data (101 entities detected) warrants that both chromatographic and spectrometry-processing data were under control and provided high-quality results, which in turn differentiate wine samples according to site and vintage. It also shows potential biomarkers that can be identified. This is a step forward in the pursuit of Mexican wine characterization that could be used as an authentication tool.


Introduction
As demand increases, knowledge about food and beverage quality and authenticity also grows. Untargeted metabolomics approaches are emerging as powerful tools [1][2][3]. Metabolomics comprise the analysis of all metabolites (low-molecular-weight molecules) present in a cell, organism or system, accomplished preferentially, in a single analysis [4]. Experimentally, metabolomics analysis represents a great challenge because of its premise, particularly untargeted methods with the purpose of measuring as many metabolites as possible, while chemical identity is not necessary before data acquisition [5]. Targeted method guidelines are constantly updated; however, metabolomics method validation is complicated and revised guidelines of minimum reporting standards for untargeted studies are needed [6,7]. Consequently, the metabolomics community is encouraging the implementation and communication of quality assurance and quality control in untargeted metabolomics studies [8][9][10][11][12].
In recent years, metabolomics approaches have been applied in wine science for quality determination in order to evaluate the influence of different enological practices, microbial fermentation behavior and terroir. However, most have not reported the method validation, quality assurance and/or quality control applied, as well as assessment through the metabolomics-methodology pipeline [13][14][15][16][17][18][19][20][21], for more examples see [2]. The untargeted metabolomics-methods pipeline consists of four main steps, as proposed by Brown et al. [22]: (1) experimental design and metadata capture; (2) data preprocessing; (3) cleaned

Method Validation and Data Acquisition
The untargeted analysis objective is to determine as many metabolites as possible; therefore, highly repeatable and reproducible data are required. However, it has been reported that it is not clear how exhaustive and reliable current raw data processing is [12]. Therefore, although complicated, it is clear that method validation and quality analysis are needed. In order to validate the method, guidelines for targeted methods were included (Repeatability, Reproducibility, Linearity, LOD and LOQ). Although these parameters had to be determined for each targeted metabolite, in untargeted methods it is suggested to select metabolites that are present in samples, have similar chemical properties and molecular mass and are distributed along the runtime of the acquisition method [10]. Based on this, the chemical standards α-Pinene, β-Pinene, p-Cymene and 2-Undecanone were selected (Supplementary Materials Table S1).
The repeatability and reproducibility of the extracted component area of each level and metabolite were <20.0% RSD (Supplementary Materials Table S1), complying with recommended criteria [33]. The retention-time (RT) reproducibility of all standards was <0.5% RSD, where maximum variation (SD) was of 0.09 min (5.4 s); this minimal variation enhances alignment across samples and identification by default. Hence, a match-factor penalty is applied in method identification if the RT variation is greater than 12 s. Matchfactor reproducibility was <2.0% RSD, which can be related to mass-fragmentation spectrum stability which depends on mass-spectrum comparison and mass accuracy. Mass accuracy in the five most abundant fragments of each standard was <5 ppm (Figure 1e,f), which allowed match factors greater than 80 on all metabolite identifications.  Table 1) by manual recursive analysis. (c) Enlarged ethyl nonanoate peak. (d) Ethyl nonanoate ion peaks; each color represents an ion, exact mass is represented in the same color. (e) Ethyl nonanoate mass-spectrum fragmentation-pattern comparison, orange: acquired spectrum, (f) black: library spectrum.  Table 1) by manual recursive analysis. (c) Enlarged ethyl nonanoate peak. (d) Ethyl nonanoate ion peaks; each color represents an ion, exact mass is represented in the same color. (e) Ethyl nonanoate mass-spectrum fragmentation-pattern comparison, orange: acquired spectrum, (f) black: library spectrum.
Decyl alcohol 2 9.4 × 10 6 (0.7) 9.8 × 10 6 (0.9) ↑ 1.0 9.1 × 10 6 (0.4) ↓ 1.0 9.7 × 10 6 (0.   Selectivity was assessed with the method capability to successfully discriminate between isomers α-Pinene and β-Pinene (136.125 g/moL) by retention time and mass fragmentation spectrum. Selectivity is an important quality to enable component extraction in the data-processing phase [10]. LOD concentrations were below 0.2 ng/L for each standard and LOQ were 2.5 ng/L (Supplementary Materials Table S1), indicating a high sensitivity in the method for detecting low-abundance components. Even though these parameters cannot be used to quantify other metabolites in untargeted methods [34], this method still provides an overview of metabolites' chromatographic behavior. Interestingly, p-Cymene and 2-Undecanone could be used as an internal control for SPME fiber's life span. As a sign of fiber deterioration, p-Cymene splits in two chromatographic peaks and 2-Undecanone abundances greatly decrease (data not shown).
An advantage of using a wine-spiked pool as matrix for method validation was that 74 compounds were identified, allowing the calculation of their extracted-area reproducibility ≤ 15.0% RSD (Supplementary Materials Table S2). Figure 1a shows a typical total ion chromatogram (TIC) of wine components; however, components present in most abundant chromatographic peaks could not be identified because of ion saturation and/or ion peak aberrancy. Thus, a method with split desorption must be performed to identify these metabolites. Since our interest resides in the low-abundance (Figure 1b and Table 1) metabolites present in wines, and the method was able to separate and extract them (Figure 1c,d), we decided to work with a splitless method to analyze samples. Consequently, method validation demonstrated that the extracted components' area, RT and match factor were reproducible and unaffected by the concentration required for successful data processing/mining, data identification and data interpretation/analysis.

Quality Control
Recursive analysis successfully identified 74 compounds in a pooled wine (PW) and 76 in spiked pooled wine (PWS). Interestingly, isobutyl acetate (116.1583 g/moL) was identified in the PW (RT 10.65 min) but not in spiked samples. It seems that the method could not extract the isobutyl acetate component peak from the spiked α-pinene component peak. Moreover, it appears to include a p-cymene carryover of 0.06 ng/L which is less than 20% of LOQ, the acceptance criteria recommended by the FDA [33] for targeted analysis. Overall quality-control analysis was performed using MPP (MassHunter Workstation, Agilent Technologies, Santa Clara, CA, USA) by importing CEF files of PW, PWS and wine samples. The quality-control PCA (Principal Component Analysis) included a total of 109 entities and was clustered tightly out of all QC samples from wine samples, as shown in Figure 2; therefore, the data set was considered to be of high quality [5] and we proceeded to data interpretation. than 20% of LOQ, the acceptance criteria recommended by the FDA [33] for targeted analysis. Overall quality-control analysis was performed using MPP (MassHunter Workstation, Agilent Technologies, Santa Clara, CA, USA) by importing CEF files of PW, PWS and wine samples. The quality-control PCA (Principal Component Analysis) included a total of 109 entities and was clustered tightly out of all QC samples from wine samples, as shown in Figure 2; therefore, the data set was considered to be of high quality [5] and we proceeded to data interpretation.

Wine Characterization
Recursive analysis extracted and identified 77, 75, 78 and 73 metabolites in wines from La Changa 2017 and 2018 and Los Dolores 2017 and 2018, respectively (Table 1). PCA included 101 metabolites (Table 1), where the first three components explained 86.71% of total variance (data not shown); furthermore, using the first two components (67.02% of total variance) allowed the clustering of wines by vineyard and vintage (Figure 3a) with a metabolite distribution shown in Figure 3b (PCA loadings). To elucidate PC1 and PC2's meaning, a closer glance at metabolites near to wine-clustering areas was required; PC1 appears to be related to variables depending upon vintage, while PC2 allows the separation of wines by typology, and therefore is associated with wine quality or/and sensorial profile.  (Table 1). PCA included 101 metabolites (Table 1), where the first three components explained 86.71% of total variance (data not shown); furthermore, using the first two components (67.02% of total variance) allowed the clustering of wines by vineyard and vintage (Figure 3a) with a metabolite distribution shown in Figure 3b (PCA loadings). To elucidate PC1 and PC2's meaning, a closer glance at metabolites near to wine-clustering areas was required; PC1 appears to be related to variables depending upon vintage, while PC2 allows the separation of wines by typology, and therefore is associated with wine quality or/and sensorial profile. Regarding PC2 (Figure 3), some of its positive loadings such as 4-ethylguaiacol (compound #88 Table 1) and 4-ethylphenol (#94) contribute with undesirable aromas and have been reported in wines affected with Brettanomyces [35]. However, 2-undecanol (#64), 4methylbenzaldehyde (#55), furfuryl ethyl ether (#18), methyl salicylate (#72) and translinalool oxide (#38) have been associated with spicy notes or found in spices, with roasted nuts, cooked beef and blackberry aromas; isoamyl acetate (#10) has banana and balsamic  Regarding PC2 (Figure 3), some of its positive loadings such as 4-ethylguaiacol (compound #88 Table 1) and 4-ethylphenol (#94) contribute with undesirable aromas and have been reported in wines affected with Brettanomyces [35]. However, 2-undecanol (#64), 4-methylbenzaldehyde (#55), furfuryl ethyl ether (#18), methyl salicylate (#72) and translinalool oxide (#38) have been associated with spicy notes or found in spices, with roasted nuts, cooked beef and blackberry aromas; isoamyl acetate (#10) has banana and balsamic notes and α-terpineol (#62) has anise and citrus. At the same time, some of its negative PC loadings-metabolites such as monoethyl succinate (#101), ethyl nonanoate (#44), acetoin (#20), diphenyl ether (#87) and isobutyl hexanoate (#26)-have desirable sensorial properties and are reported as sweet and fruity [36].
Further analysis on PCA-loading distribution (Figure 3b) showed that metabolites at PC1-and PC2-negative loadings have fruity and citrus descriptors; those at PC1-negative and PC2-positive loadings are described as fresh, sweet, floral and fruity; while those at PC1-positive and PC2-negative loadings are predominantly floral and sweet notes. PC1and PC2-positive loadings are less desirable, with descriptors such as alcoholic, balsamic and phenolic [36]. Based on these descriptors it could be inferred that the 2018 vintages from both vineyards have fruitier, more citrus, sweeter and fresher notes than the 2017 vintage, and Los Dolores 2017 presents floral notes. According to these results, La Changa 2017 could be the most-balanced wine as it is positioned almost at the center of the PCA ( Figure 3a); however, sensorial analysis is required to confirm these assumptions. In addition, some putatively identified (level 2) and unknown (level 4) components have potential use as biomarkers for vineyard and vintage classification; consequently, their elucidation is required [7].
Additional data analysis ( Figure 4 Table 1). Interestingly, 54 compounds are shared by vineyards and vintages, which ideally, could indicate a metabolomic fingerprint of Santo Tomás Valley; however, extensive sampling and further analysis is needed to conclude this. Nevertheless, 24 of those compounds ( Table 2)  (#20), diphenyl ether (#87) and isobutyl hexanoate (#26)-have desirable sensorial properties and are reported as sweet and fruity [36]. Further analysis on PCA-loading distribution (Figure 3b) showed that metabolites at PC1-and PC2-negative loadings have fruity and citrus descriptors; those at PC1-negative and PC2-positive loadings are described as fresh, sweet, floral and fruity; while those at PC1-positive and PC2-negative loadings are predominantly floral and sweet notes. PC1and PC2-positive loadings are less desirable, with descriptors such as alcoholic, balsamic and phenolic [36]. Based on these descriptors it could be inferred that the 2018 vintages from both vineyards have fruitier, more citrus, sweeter and fresher notes than the 2017 vintage, and Los Dolores 2017 presents floral notes. According to these results, La Changa 2017 could be the most-balanced wine as it is positioned almost at the center of the PCA ( Figure 3a); however, sensorial analysis is required to confirm these assumptions. In addition, some putatively identified (level 2) and unknown (level 4) components have potential use as biomarkers for vineyard and vintage classification; consequently, their elucidation is required [7].
Additional data analysis (Figure 4) Table 1). Interestingly, 54 compounds are shared by vineyards and vintages, which ideally, could indicate a metabolomic fingerprint of Santo Tomás Valley; however, extensive sampling and further analysis is needed to conclude this. Nevertheless, 24 of those compounds (Table 2)      Interestingly, 2,4-di-tert-buthylphenol (#99) was 5-fold higher in the 2017 vintage than in 2018. To our knowledge, it has not been reported in Cabernet Sauvignon wines; however, was detected with the same abundance in red and white wines from Portugal [37]. Furthermore, Marselan wines (Cabernet Sauvignon × Grenache varieties) inoculated with S. cerevisiae presented higher concentrations of 2,4-di-tert-buthylphenol than in spontaneously fermented wines [38]. Persimmon-inoculated wines showed similar behavior [39]. This compound has antifungal and antioxidant characteristics [40] but no aromatic properties have been reported yet. Moreover, the compound was first detected at the end of alcoholic fermentation in the 2017 vintage and increased after malolactic fermentation (data not shown). Although produced by non-Saccharomyces yeasts [41] and lactic-acid bacteria [40], 2,4-di-tert-buthylphenol could be a potential marker in vintage differentiation as microbial terroir cannot be discarded.
The Venn diagram (Figure 4) showed that 54 compounds were present in all wines and enabled the selection of unique compounds for each one. Los Dolores 2017 wine pre- 133.0128: 150.0449 m/z, formula C8H7NO2) were found only in wines from La Changa vineyard. This set of compounds could be used as potential markers to identify wines from La Changa vineyard, although as stated before, a larger sample size must be analyzed for confirmation.
Reports have estimated that 62% of metabolites present in wine remain unidentified and target metabolomics cannot resolve this drawback [2], thus generating a free library with reliable data of unknown metabolites (accurate mass spectrum, RI, RT and potential formula) that could enable their rapid identification. This feature should be added as part of the minimum reporting standard procedure [7] to enhance probability and move identification levels upward. Furthermore, it will provide a robust and comprehensive workflow report that could improve reproducibility of results and the exchange of experimental data among research groups [5].

Conclusions
Metabolomics studies urgently require establishing guidelines for validation of untargeted methods, particularly for complex matrices such as beverages. Here, we used parameters for targeted experiments combined with recursive analysis of data for quality assurance to show that both chromatographic and spectrometry-processing data were under control and complied with certain guidelines. During validation, an accurate mass library, VinoST2.mslibrary.xml, was created, and included the retention index, retention time and CAS number of metabolites that were putatively identified, and the exact mass and molecular formula of those classified as unknowns.
Recursive analysis of metabolite data and PCA successfully differentiated Cabernet Sauvignon wines from two vineyards and two vintages and gave an approximation of their aromatic notes. In addition, potential markers of vineyard and vintage were pointed out, and a profile of 54 compounds was described in all Cabernet Sauvignon wines from Santo Tomás Valley. This effort constitutes an advance in the pursuit of Mexican wine characterization that could be used as an authentication tool.

Samples
Cabernet Sauvignon wines of vintages 2017 and 2018 from two different vineyards were collected from 55,000 L stainless steel tanks, bottled (750 mL, sealed with natural cork) and stored horizontally at room temperature until sample processing. Vineyardmanagement practices of La Changa and Los Dolores vineyards, from Bodegas de Santo Tomás (Ensenada, B.C., Mexico, 31 •

Data Acquisition
Samples were analyzed using a 7890B GC System (Agilent Technologies, Santa Clara, CA, USA) coupled to a 7200 mass spectrometer with quadrupole-time-of-flight (MS-qTOF) (Agilent Technologies, Santa Clara, CA, USA), with an autosampler PAL3 System (CTC Analytics AG, Zwingen, Switzerland) and a head-space solid-phase micro-extraction (HS-SPME) module with 50/30 µm DVB/CAR/PDMS Stable Flex Supelco fiber (Agilent Technologies, Santa Clara, CA, USA) [42]. Three grams of NaCl were added to 10 mL of sample in a 20 mL amber vial sealed with an aluminum cap and an 18 mm blue PTFE/silicone septum (Agilent Technologies, Santa Clara, CA, USA), as described [43]. Modified parameters for extraction [44], separation [45] and detection are summarized in Table 3. Mass calibration was performed at the beginning and after running three samples, to ensure mass accuracy.

Method Validation
To validate the data-acquisition method, repeatability, reproducibility, linearity and limits of detection (LOD) and quantification (LOQ), the chromatographic standards α-Pinene and p-Cymene from Honeywell Fluka™ (Morristown, NJ, USA) and, β-Pinene (99%) and 2-Undecanone (99%) from Sigma-Aldrich (St. Louis, MO, USA) were used. Validation parameters were performed in a pooled-wine (PW) matrix to prevent matrix interferences. Concentration range of α-Pinene and β-Pinene was 1.56 (L1) to 25.00 (L5) ng/L with 1:1 factor, while p-Cymene and 2-Undecanone was 0.31 (L1) to 5.00 ng/L (L5) with the same factor. Repeatability was determined using a five-level curve by triplicate on day one. Reproducibility was calculated with a three-level curve (L1, L3 and L5), also in triplicate, on the second day of work. Mean, standard deviation (SD) and relative standard deviation (%RSD) were calculated for each level to determine repeatability and reproducibility. Linearity was determined by the correlation coefficient (r) of five-level standard curves and PW as a blank sample (matrix sample without standards). LOD and LOQ were determined from ten injections of L1 in three different days (Supplementary Materials, Table S1) and calculated using Agilent MassHunter WorkStation Quantitative Analysis version 10.0 (Agilent Technologies, Santa Clara, CA, USA).

Quality Control
Quality control was assessed by monitoring pooled samples of Cabernet Sauvignon wines from Santo Tomás. Every batch sequence of injections included a PW, PWS with standards at L4 concentration (Supplementary Materials Table S1) and samples from both vintages and vineyards. Injections were randomized, analyzed in triplicate (Supplementary Materials Table S3) and processed by recursive analysis as described in the Data Processing section. Data processing/mining of raw data was an exhaustive and crucial step for untargeted analysis; this process must generate a holistic and reliable representation of the metabolites present in each sample [5]. Data processing was performed in two steps in order to generate a recursive analysis (as pretreatment to ease data interpretation/analysis of complex matrices) using Agilent MassHunter WorkStation Unknowns Analysis software version 10.0. All data acquired were converted to the SureMass format (only data acquired in profile mode can be converted). First step for recursive analysis was to extract and identify most of components in the QC pool to create an internal library (see Internal library); second step was recursive analysis (described later). Component extraction was performed using SureMass deconvolution with a retention-time (RT) window factor of 300, a 5 SNR (signal-to-noise ratio), extraction window of ±10 ppm, threshold of 25% in component shape, a minimum of four ion peaks for extraction and a maximum of 10 ion peaks to store. Area and height filters were not applied because the aim of this study was to also include minor compounds, which resulted in an exhaustive manual/visual analysis of ion-peak shapes. Extracted components were identified with Accurate Mass Flavors Database [46] and NIST 17, as described below.

Retention Index
Retention indices (RI) were calculated using 50 ng/L C8-C40 Alkanes calibration standard (Sigma-Aldrich, St. Louis, MO, USA). Liquid injection (1 µL) in manual mode was used to improve the signal acquired. Acquired data was processed as indicated above and identified with NIST 17 library, then exported in library format. Agilent MassHunter WorkStation Library Editor 10.0 was used to activate only "Compound name", "CAS#", "Retention Index" and "Retention Time" columns, in that order, and saved in a CSV (comma-separated values) format to create the RT calibration file, which contains alkanes RI to be used in recursive analysis to calculate the RI of unknown components.

Internal Library
For data reduction and to decrease false positives and false negatives, QC-pooled sample was analyzed to generate an internal library for recursive analysis. Components extracted were identified with Accurate Mass Flavors Database [46] and NIST 17. Identification method (level 2, as proposed by Sumner et al. [33] included spectral search with a minimum match factor of 70, performing an exact-mass comparison, starting at 30 m/z, with accuracy < 20 ∆ppm. Once automatic identification was carried out, manual/visual analysis was completed (components identified as fiber and column materials were eliminated). Putatively identified compounds were assigned when ion peaks, massfragmentation spectrum and RI matched (∆RI < 30) libraries' components. If one of these parameters did not comply, the component was exported as library file and identified as Unknown + RT (min). With this method, an exact-mass library, VinoST2.mslibrary.xml (available at https://www.ciad.mx/VinosMxDB, accessed: 6 November 2019), was created and includes RT and RI of a total of 93 compounds, with 25 of them identified as Unknowns (m/z shown in Supplementary Materials, Table S4).

Recursive Analysis
VinoST2.mslibrary.xml library was added to recursive-analysis method using RT as a match factor with a trapezoidal penalty range of 18 s and a penalty-free range of 12 s, in order to align components across samples. Libraries Flavors-14-mslibrary.xml and NIST 17 were added and used without RT as a match factor to identify components not present in QC sample, using the same parameters applied to internal library creation and then adding them to it. RT calibration file was also included to calculate RIs. In order to identify a given compound with the internal library, the component of interest had to be present in the three replicates and match the compound fragmentation spectra, RI and RT (Figure 1). Once all samples were analyzed and their compounds identified, AllBestHits script was run to export the data in CEF format (Compound Exchange Format).

Data Interpretation/Analysis
Data interpretation/analysis was performed using Agilent MassHunter WorkStation Mass Profiler Professional (MPP) version 15.0. Identified components' data were imported and grouped by vineyard (La Changa and Los Dolores) and vintage (2017 and 2018), considering a 2 × 2 factorial design, then transformed with the median of the baseline of all samples and used to perform a principal-component analysis (PCA) on all entities and samples where variance and covariance matrix method was used. A Venn diagram was performed with entities' lists of wines from both vineyards and vintages. From this, all entities in both vineyards were selected to perform a moderated t-test (p-value cutoff of 0.05 and Benjamini-Hochberg as multiple testing correction, FC > 1.1) comparing vintages. Moreover, a 2-way ANOVA was performed, pairing conditions between vintages and vineyards.
Supplementary Materials: The following supporting information can be downloaded online: Supplementary Material-Validated results, identified compounds in pool QC tables, batch sequence and Unknowns entities m/z; Table S1 contains repeatability, reproducibility, linearity (r), limit of detection (LOD) and limit of quantification (LOQ) determinations; Table S2 shows RT, RI, CAS# and %RSD of 74 compounds identified in pool QC; Table S3 shows the batch sequence used to acquire vineyard and vintage data; Table S4 contains m/z of 25 unknown entities detected.