Untargeted Headspace-Gas Chromatography-Ion Mobility Spectrometry in Combination with Chemometrics for Detecting the Age of Chinese Liquor (Baijiu)

This paper proposes the combination of headspace-gas chromatography-ion mobility spectrometry (HS-GC-IMS) and chemometrics as a method to detect the age of Chinese liquor (Baijiu). Headspace conditions were optimized through single-factor optimization experiments. The optimal sample preparation involved diluting Baijiu with saturated brine to 15% alcohol by volume. The sample was equilibrated at 70 °C for 30 min, and then analyzed with 200 μL of headspace gas. A total of 39 Baijiu samples from different vintages (1998–2019) were collected directly from pottery jars and analyzed using HS-GC-IMS. Partial least squares regression (PLSR) analysis was used to establish two discriminant models based on the 212 signal peaks and the 93 identified compounds. Although both models were valid, the model based on the 93 identified compounds discriminated the ages of the samples more accurately according to the goodness of fit value (R2) and the root mean square error of prediction (RMSEP), which were 0.9986 and 0.244, respectively. Nineteen compounds with variable importance for prediction (VIP) scores > 1, including 11 esters, 4 alcohols, and 4 aldehydes, played vital roles in the model established by the 93 identified compounds. Overall, we determined that HS-GC-IMS combined with PLSR could serve as a rapid and accurate method for detecting the age of Baijiu.


Introduction
Ageing is an integral part of the production of most distilled spirits, such as whiskey and brandy, and improves their quality [1,2]. In general, freshly distilled spirits smell and taste rough, unpleasant, and unbalanced [2,3]. During the ageing period, some compounds in the spirits undergo chemical reactions, which affect the final flavor and taste profiles of the spirits [1,2,4]. Ageing plays a critical role in the production process of high-quality spirits, but is extremely time-consuming and often requires several years or more to complete [5]. Consequently, spirits' economic value is highly associated with their age [6]. Owing to the commercialization and relatively high costs of aged spirits, counterfeiting these products is common worldwide [7]. Therefore, it is necessary to establish a rapid and accurate method to detect the age of spirits to protect consumers from being deceived concerning the age and quality of the spirits.

GC-IMS Conditions
For the HS-GC-IMS analysis, a Shimadzu GC-2010 gas chromatograph (Shimadzu, Kyoto, Japan) equipped with a Perkin Elmer TurboMatrix16 autosampler (Perkin Elmer, MA, USA.) was coupled to an IMS module from GAS (Dortmund, Germany).
The analytes were separated in a DB-FFAP column (60 m × 0.25 mm × 0.25 µm film thickness; J & W Scientific; Folsom, CA, USA.) using nitrogen gas (>99.999%) at a constant flow rate of 0.8 mL/min. The temperature of the column was maintained at 40 • C for 3 min, then was increased to 150 • C at 4 • C/min, and held at 150 • C for 5 min (total of 35.5 min).
After the analytes were separated in the column, they were driven into the IMS module. First, the volatile organic compounds were ionized by the tritium source in positive mode. Then, the ions were placed into a 9.8 cm long drift tube operating at 500 V/cm and 45 • C. Next, nitrogen drift gas (>99.999%) was introduced at 150 mL/min. An average of 12 scans was performed at a repetition rate of 30 ms and a grid pulse width of 150 µs to build each spectrum. HS-GC-IMS data were obtained by Standalone (GAS, Dortmund, Germany), and the raw data were analyzed using VOCal (GAS, Dortmund, Germany) software to reveal information regarding the composition of the samples.

Identification of Compounds
Compounds were identified by comparing their RIs and drift times with those of pure standards under the same conditions. To obtain more accurate results, all standard products were injected in batches. The information of the standards is shown in Table S2. The RIs were calculated using a mixture of C 4 -C 9 ketones.

Statistical Analysis
To validate the models for Baijiu age identification, all samples were randomly divided into two categories: a test set and a prediction set. According the research of Gerhardt [30], a total of 35 samples were used as the test set and 4 samples were used as the prediction set. PLSR was used to establish a regression model between Baijiu ages (Y variable) and the volatiles (X variables), using the test set with SIMCA software (version 14.1 Umetrics; Sartorius Stedim Biotech AS; Umea, Sweden). The prediction ability of the model was validated using the prediction set. In this analysis, the data were subjected to Pareto scaling, wherein each variable was divided by the square root of its standard deviation.
To reduce the risk of overfitting, the number of latent variables in the model was decided by internal seven-fold cross-validation [31]. The samples were divided into seven groups to verify the accuracy of the model. The quality of the PLSR model was evaluated according to its R 2 Y and Q 2 values, where R 2 Y represents the percentage of variation in the Y variable and Q 2 represents the predictive ability [17]. Both values range from zero to one. Values closer to one indicate better goodness of fit and prediction ability. For the parameter Q 2 , values greater than 0.4 are acceptable. In addition, a permutation test was conducted to validate the robustness of the model [5].

Optimization of Headspace Parameters
Headspace conditions influence the response of HS-GC-IMS; therefore, researchers typically conduct optimization experiments before using HS-GC-IMS to analyze food samples, as has been conducted with olive oil, ham, and honey [17][18][19]26,32]. However, the optimization of distilled spirits by HS-GC-IMS has not been reported thus far. In this study, the headspace method parameters, including diluent, alcohol content, incubation temperature, and injection volume, were optimized to obtain more information, resolution, and signal intensity for the samples. To assess the effects of different parameters, Arroyo-Manzanares et al. relied on visual observation of topographic maps [17], and del Mar Contreras et al. used signal intensity [18]. In this study, three innovative indicators were used: the number, the height, and the volume of the peaks. The number of peaks represents the quantity of the detected compounds. The height and volume of the peaks reflect the concentration of the detected compounds. Below a certain concentration, the height and volume of the signals correspond to the concentration of the compound. However, when the concentration is greater than that value, the height of the signals remains unchanged and only the volume of the peaks increases. The number of protons provided by the tritium source was fixed; therefore, to obtain a better and more stable response from the low-concentration compounds during detection, the height and volume of the peaks were used as the second and third indicators, respectively.

Effects of Salt Addition
There are two opposite effects caused by the addition of NaCl, called salting in and salting out [18]. The addition of NaCl did not cause significant increases in the number and total height of the signals, but it did significantly increase the total volume of the peaks compared to that of the sample diluted with ultrapure water (Figures 1A-C and S1). Owing to the salting out effect, the evaporation of volatile compounds from the solution to the headspace was promoted by the addition of salt [29].

Effects of Alcohol Content
Ethanol content has two main effects on the detection of compounds. First, ethanol influences the release of other compounds [33]. Second, ethanol molecules compete with the compound molecules to have a fixed number of protons [20]. Thus, it was necessary to determine the appropriate alcohol content. Alcohol content had a significant impact on the HS-GC-IMS method ( Figures 1D-F and S2). The number of peaks was largest at 15% ethanol by volume (ABV), which is similar to the result at 10% ABV. However, the total height of the signals with 15% ABV was significantly higher than that of 10% ABV. Thus, each sample was diluted to 15% ABV for analysis, which is different from studies that used 10% ABV brandy with GC-IMS [5] and 5% ABV Baijiu with HS-SPME-GC-MS [34]. In order to ensure the consistency of the alcohol content, an accurate test of the alcohol content in each sample was carried out, and then each sample was diluted to 15% alcohol by volume strictly in accordance with the proportion.

Effects of Alcohol Content
Ethanol content has two main effects on the detection of compounds. First, ethanol influences the release of other compounds [33]. Second, ethanol molecules compete with the compound molecules to have a fixed number of protons [20]. Thus, it was necessary to determine the appropriate alcohol content. Alcohol content had a significant impact on the HS-GC-IMS method ( Figures 1D-F and S2). The number of peaks was largest at 15% ethanol by volume (ABV), which is similar to the result at 10% ABV. However, the total height of the signals with 15% ABV was significantly higher than that of 10% ABV. Thus, each sample was diluted to 15% ABV for analysis, which is different from studies that used 10% ABV brandy with GC-IMS [5] and 5% ABV Baijiu with HS-SPME-GC-MS [34]. In order to ensure the consistency of the alcohol content, an accurate test of the alcohol content in each sample was carried out, and then each sample was diluted to 15% alcohol by volume strictly in accordance with the proportion.

Effects of Incubation Temperature
As demonstrated in the present study, the incubation temperature directly affects the equilibrium concentration of volatile organic compounds in the headspace [35][36][37]. In this study, the incubation temperature was varied from 40 to 70 °C ( Figures 1G-I and S3). The incubation temperature was not raised above 70 °C to prevent water vapor from interfering with the operation of the instrument. The release of volatile organic compounds with particularly high boiling points was promoted by increasing the incubation temperature, which increased the intensity of the peaks [17]. For this reason, 70 °C was selected as the

Effects of Incubation Temperature
As demonstrated in the present study, the incubation temperature directly affects the equilibrium concentration of volatile organic compounds in the headspace [35][36][37]. In this study, the incubation temperature was varied from 40 to 70 • C (Figures 1G-I and S3). The incubation temperature was not raised above 70 • C to prevent water vapor from interfering with the operation of the instrument. The release of volatile organic compounds with particularly high boiling points was promoted by increasing the incubation temperature, which increased the intensity of the peaks [17]. For this reason, 70 • C was selected as the optimal condition.

Effects of Injection Volume
The injection volume directly influences the concentration of volatile organic compounds entering the detector, making it an imperative parameter to optimize. The injection volume ranged from 40 to 300 µL. After considering all indicators, we determined that an injection volume of 200 µL was the most suitable ( Figures 1J-L and S4).
In summary, the final conditions were confirmed using a single-factor optimization experiment. The original Baijiu sample was diluted to 15% ABV with saturated brine, and each 20 mL vial was filled with 2 mL of the diluted sample. After incubation at 70 • C for 30 min, the autosampler sucked 200 µL of headspace gas into the chromatographic column for sample analysis.

Identification of Compounds in Baijiu Samples
Baijiu samples of different vintages were analyzed under the optimized conditions mentioned above. The results of the HS-GC-IMS analysis of samples A1-4 are shown as topographic plots where the x-axis represents the normalized drift time, and the y-axis umn for sample analysis.

Identification of Compounds in Baijiu Samples
Baijiu samples of different vintages were analyzed under the optimized condition mentioned above. The results of the HS-GC-IMS analysis of samples A1-4 are shown a topographic plots where the x-axis represents the normalized drift time, and the y-axis represents the retention time ( Figure 2A). The red vertical line denotes the normalized reaction ion peak. Each point in the plot represents one or multiple signals, and the dif ferent colors describe the intensity of the signals. Deeper red indicates a stronger intensity Seventy-five percent of all signals appeared in the range of 1.0-2.0 ms with regard to nor malized drift time and in the range of 500-1200 s with regard to retention time. A total of 212 signals were detected in the Baijiu samples. A qualitative analysis led to the identification of the relationships between these signals and the ageing compound in Baijiu. First, the RI of each compound was calculated using n-ketone. Then, the com pounds were identified by comparing their RIs and drift times with those of the standard reference compounds, which are recorded in the NIST database and the IMS database Thereafter, an IMS database of compounds in Baijiu was established (Table 1). A total o 93 compounds were identified in the samples, including 33 aldehydes and ketones, 39 esters, 18 alcohols, and 1 acid. Dimers and trimers were found in the high-concentration compounds. Notably, IMS provides the second separation of compounds, making it pos sible to separate isomers [20]. There were some separated isomers, e.g., code 22: 3-methyl 1-butanol and code 33: pentan-2-ol, with the formula of C5H12O. A total of 212 signals were detected in the Baijiu samples. A qualitative analysis led to the identification of the relationships between these signals and the ageing compounds in Baijiu. First, the RI of each compound was calculated using n-ketone. Then, the compounds were identified by comparing their RIs and drift times with those of the standard reference compounds, which are recorded in the NIST database and the IMS database. Thereafter, an IMS database of compounds in Baijiu was established (Table 1). A total of 93 compounds were identified in the samples, including 33 aldehydes and ketones, 39 esters, 18 alcohols, and 1 acid. Dimers and trimers were found in the highconcentration compounds. Notably, IMS provides the second separation of compounds, making it possible to separate isomers [20]. There were some separated isomers, e.g., code 22: 3-methyl-1-butanol and code 33: pentan-2-ol, with the formula of C 5 H 12 O.
Distinguishing samples using HS-GC-IMS has several advantages [20,38]. First, HS-GC-IMS is highly stable and sensitive and can accurately detect changes in compound concentrations. Second, HS-GC-IMS can achieve 2D separation of the signal peak, similar to two-dimensional gas chromatography and mass spectrometry (GC×GC-MS). Signal peaks that are originally clustered together can be distinguished to identify additional compounds by enlarging the gap between signals. This is useful to differentiate samples of various ageing durations. Because of the 2D separation ability of HS-GC-IMS, peaks 59 and 66 could be separated from each other and were identified as 1-butanol and 2-heptanone, respectively ( Figure 2B).
A gallery plot was constructed of the voltage intensities of the 93 identified compounds (Figure 3). There are no obvious differences in the intensities of the signal peaks in frame B among samples of different age groups (Figure 3 frame B). In contrast, the intensities of the signal peaks in frame A generally decrease for each year (Figure 3 frame A). A trend of increasing signal intensity with age is apparent in frame C (Figure 3 frame C and Figure S5). The remaining peaks change irregularly with ageing time.  Distinguishing samples using HS-GC-IMS has several advantages [20,38]. First, HS-GC-IMS is highly stable and sensitive and can accurately detect changes in compound concentrations. Second, HS-GC-IMS can achieve 2D separation of the signal peak, similar to two-dimensional gas chromatography and mass spectrometry (GC×GC-MS). Signal peaks that are originally clustered together can be distinguished to identify additional compounds by enlarging the gap between signals. This is useful to differentiate samples of various ageing durations. Because of the 2D separation ability of HS-GC-IMS, peaks 59 and 66 could be separated from each other and were identified as 1-butanol and 2-heptanone, respectively ( Figure 2B).
A gallery plot was constructed of the voltage intensities of the 93 identified compounds ( Figure 3). There are no obvious differences in the intensities of the signal peaks in frame B among samples of different age groups ( Figure 3B). In contrast, the intensities of the signal peaks in frame A generally decrease for each year ( Figure 3A). A trend of increasing signal intensity with age is apparent in frame C (Figures 3C and S5). The remaining peaks change irregularly with ageing time.
While some of the relationships between volatiles and ageing time were observable on a gallery plot, others required multivariate statistical analysis to differentiate and iden-  Table 1. (The intensities of the signal peaks in frame A generally decrease for each year; the intensities of the signal peaks in frame B have no obvious differences; the intensities of the signal peaks in frame C generally increase for each year).
While some of the relationships between volatiles and ageing time were observable on a gallery plot, others required multivariate statistical analysis to differentiate and identify.

Establishment and Validation of Models for Baijiu Age Identification
The test set was used to establish a PLSR model for the identification of Baijiu age. Two data arrays were detected using HS-GC-IMS, including 212 signal peaks and 93 identified compounds. In some studies using GC-MS, UPLC-Orbitrap-MS/MS, GCpulsed flame photometric detection, and GC-flame ionization detection, samples were discriminated [8,39,40]. With HS-GC-IMS, all of the signal peaks are usually used to establish the model because of the small number of identified compounds [18,24,26,41]. In this study, both signals and identified compounds were used to build models, and one data array was selected for in-depth analysis. The PLSR models based on 212 signal peaks and 93 identified compounds were named Model A and Model B, respectively. Four latent variables were selected to build the models in cases where the fifth latent variable was insignificant after seven-fold crossvalidation. The value of Q 2 was 0.962 in Model A and 0.968 in Model B. The value of R 2 Y was 0.990 in Model A and 0.988 in Model B. The values of both Q 2 and R 2 Y were close to one, and there was little difference between the two models. This indicates that the optimized HS-GC-IMS conditions for untargeted analysis of samples can be used to distinguish samples from different years. The model has reliable predictive abilities and fit (Figure 4a,b), demonstrating that HS-GC-IMS has broad applications to sample differentiation.  A permutation test was performed to validate the robustness of the PLSR models (Figure 4c,d). This method involves running a random arrangement of sample data and then conducting statistical inference, which increases the number of samples fed into the model. This is particularly suitable for models with a small number of samples. The result was obtained through 200 permutation tests using SIMCA software. In Figure 4c,d, all the Q 2 values (blue) and R 2 values (green) to the left are lower than the original points to the right. Moreover, the regression line of the Q 2 points intersects the y-axis below zero. Therefore, neither model has a risk of overfitting, which indicates that both models are valid.
The PLSR models for Baijiu age identification were established based on the sufficient Baijiu samples from different years having different concentrations of aroma compounds. A permutation test was performed to validate the robustness of the PLSR models (Figure 4c,d). This method involves running a random arrangement of sample data and then conducting statistical inference, which increases the number of samples fed into the model. This is particularly suitable for models with a small number of samples. The result was obtained through 200 permutation tests using SIMCA software. In Figure 4c,d, all the Q 2 values (blue) and R 2 values (green) to the left are lower than the original points to the right. Moreover, the regression line of the Q 2 points intersects the y-axis below zero. Therefore, neither model has a risk of overfitting, which indicates that both models are valid.
The PLSR models for Baijiu age identification were established based on the sufficient Baijiu samples from different years having different concentrations of aroma compounds. A connection was built between the age of the Baijiu and the concentration of aroma compounds, which made it feasible to use the model to identify the age of Baijiu.
To more accurately understand the predictive ability of the established PLSR model, the prediction set (four Baijiu samples assumed to have unknown ages) was used to verify the model (Figure 4e,f). With a reliable model, all the points should fall close to the 45 • line through the origin, and the prediction set should be close to the regression line. The goodness of fit value (R 2 ) of the regression line indicates the fitness level. The closer the R 2 to one, the better the fit of the model. The value of R 2 was 0.9923 in Model A and 0.9986 in Model B. The root mean square error of prediction (RMSEP) of Model A was larger than that of Model B, being 0.671 and 0.244, respectively. In addition, as shown in Supplementary Table S3, the deviation of Model B was smaller than that of Model A, implying that Model B has a stronger predictive ability than that of Model A. HS-GC-IMS was sensitive to aldehydes, ketones, esters, and alcohols; thus, many of these substances were detected. In addition, previous studies have shown that alcohols, esters, aldehydes, and ketones undergo significant changes during the Baijiu ageing process [8,12,42]. Therefore, analyzing the changes in these compound concentrations can distinguish and identify the age of the samples.
In summary, PLSR Model B, based on 93 identified compounds, had better fitting and predictive abilities and more accurately distinguished Baijiu samples from different vintages and identified their ages. It is worth noting that the method can also be applied to other alcoholic beverages based on analyzing sufficient numbers of samples to distinguish and identify the age of unknown samples.
According to Model B, there were 19 compounds with variable importance for prediction (VIP) scores greater than one. These 19 compounds ( Figure 5), including ethyl hexanoate A , propyl hexanoate A , ethyl pentanoate A , ethyl heptanoate A , ethyl acetate A , 2-methyl-1-propanol, methylpropanal B , butan-2-ol A , octanoic acid ethyl ester B , isoamyl acetate A , ethyl butyrate A , nonanal, ethyl hexanoate, ethyl lactate, 2-methyl butanoic acid ethyl ester A , 3-methyl-1-butanol, octanal, furfural A , and 1-hexanol A , were most important for identifying the ages of samples. Fifty-eight percent of the total compounds in Baijiu were esters, which illustrates that these important flavor compounds play a crucial role in establishing a regression model for Baijiu age [8,12,41]. The remaining compounds with VIP scores greater than one were alcohols and aldehydes, accounting for a combined 21% of the 19 compounds. Overall, HS-GC-IMS exhibited outstanding performance at identifying the sample age, implying that fewer compounds can be used in future tests to make it more rapid. Therefore, it is reasonable to apply HS-GC-IMS to age Baijiu.
Ethyl hexanoate A , propyl hexanoate A , ethyl pentanoate A , and ethyl heptanoate A (Figure 5a-d) were positively correlated with ageing time, while ethyl acetate A , 2-methyl-1propanol, and methylpropanal B (Figure 5e,f) were negatively correlated with ageing time. The R 2 values for these compounds were greater than 0.65. The remaining compounds (Figure 5g-s) play an important role in the discrimination of the ageing year, but have no linear correlation with Baijiu age, exhibiting R 2 values of less than 0.6.
The change in compounds is also affected by the ageing method, manufacturer, and storing conditions, which may reduce the accuracy of the prediction. In future study, a larger number of samples will be collected to improve the accuracy of the prediction. In the study, the voltage intensity of the compound was used to identify the age of the Baijiu. However, it is important to determine the absolute concentration of compounds so that the age of samples from different batches can be identified. Thus, the determination of absolute concentration is part of our next plan.

Conclusions
This study demonstrated the potential of HS-GC-IMS to discriminate Baijiu of different ages. HS-GC-IMS combined with PLSR performed excellently in distinguishing Baijiu samples of different ages. PLSR Model A, based on 212 signal peaks, and PLSR Model B, based on 93 identified compounds, were both valid; however, Model B more accurately identified the ages of unknown Baijiu samples, exhibiting R 2 value of 0.9986 and RMSEP of 0.244. HS-GC-IMS combined with PLSR also has better accuracy and precision for age detection than other methods. There were 19 compounds with VIP scores greater than one in Model B, including 11 esters, 4 alcohols, 4 aldehydes, and 1 acid. Among them, seven compounds are potential ageing markers in Baijiu samples, which are positively or negatively correlated with ageing time. Consequently, HS-GC-IMS combined with PLSR can be used to rapidly and accurately identify the age of Baijiu.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/foods10112888/s1, Figure S1: Topographic plot of Baijiu samples diluted with saturated brine (A), ultrapure water (B) detected by GC-IMS, Figure S2: Topographic plot of Baijiu samples diluted in different alcohol content detected by GC-IMS, Figure S3: Topographic plot of Baijiu samples detected by GC-IMS in different incubation temperature, Figure S4; Topographic plot of Baijiu samples detected by GC-IMS with different injection volume, Figure S5: Topographic plot of Baijiu samples with different years detected by GC-IMS, Table S1: Baijiu samples, Table S2: The information of standards, Table S3: The age discrimination of the prediction set samples.