Essential Oil Quality and Purity Evaluation via FT-IR Spectroscopy and Pattern Recognition Techniques

: Essential oils are highly volatile, aromatic concentrated extracts from plants with wide applications. In this study, fast, easy-to-use attenuated total reﬂection Fourier-transform infrared spectroscopy (ATR-FTIR) was combined with chemometric techniques to verify essential oils’ taxonomy and purity. Principal component analysis (PCA) clustered 30 essential oil samples into three di ﬀ erent groups based on plant botanical family and concentration. The ﬁrst group contained highly concentrated oils from the Asteraceae family, the second group contained highly concentrated oils from the Lamiaceae family, while the last group contained three highly concentrated essential oils from di ﬀ erent botanical families and commercial-grade essential oils. Thus, commercial-grade oil samples did not cluster with the corresponding concentrated oil samples despite their similar spectral patterns or botanical family. A loading plot identiﬁed infrared (IR) bands that correspond to carbonyl, vinyl, methyl and methylene group vibrations as the most important spectral bands that can be used as marker bands for discrimination between di ﬀ erent botanical plant family groups. Hierarchical cluster analysis (HCA) conﬁrmed the results obtained by PCA. ATR-FTIR spectroscopy combined with chemometric algorithms provides a direct and non-destructive method for chemotaxonomic classiﬁcation of medicinal and aromatic essential oils and an assessment of their purity.


Introduction
The global essential oils market has experienced steady and strong growth in almost every major end-use industry, such as food and beverage, personal care, cosmetics, and aromatherapy. Health benefits associated with essential oils use are expected to further drive their demand in pharmaceutical and medical applications. Unlike most of the conventional medicines and drugs, essential oils have no major side effects. Such qualities of essential oils are expected to be a major factor for market growth. Additionally, the rising prevalence of age-and stress-related health problems, such as cardiovascular diseases, Alzheimer's disease, diabetes and anxiety, is creating more demand for 2 of 12 beneficial essential oils in aromatherapy applications. Unfortunately, the high therapeutic properties and market values of essential oils [1][2][3] make them ideal candidates for potential counterfeiting or adulteration with low-quality, cheap alternatives [4,5].
The variability of essential oils from the same botanical species is determined by the chemical composition influenced by geographical and climatic factors, and especially by the method of extraction and purification. Thus, the confirmation of the identity of the essential oil is a complex issue. The rational way is to separate chemical constituents and then quantify them by different chromatographic methods (gas chromatography, HPLC) coupled with mass spectrometric detection. Analytical methods, including gas chromatography (GC) [6,7], high-performance liquid chromatography (HPLC) [8,9], nuclear magnetic resonance (NMR) spectroscopy [10] and electroanalytical techniques [11], are widely used for the analysis and evaluation of the authenticity of consumable products, with excellent accuracy and precision. However, some of these methods suffer from long analysis times, high cost and lack of portable instrumentation that limit their wide application for routine control analysis. They often involve extensive optimization of the chromatographic separation conditions for each essential oil type. In addition, reading a GC/mass spectrometry (MS) chromatogram requires skill and experience. For the quantification of constituents, standards are required as well as expected ranges for each constituent in order to determine whether their concentration is between these ranges. To address these challenges, non-destructive, easy-to-use, analytical spectroscopic methods such as infrared spectroscopy as a green tool combined with chemometric analysis have been used for authentication and quality assurance of food products [12,13]. Non-targeted fingerprinting by Fourier-transform infrared (FTIR) spectroscopy has gained popularity as an alternative to classical GC-based methods because it allows fast, green, non-destructive and cost-effective assessment of quality of essential oils [14]. The major advantage of IR spectroscopy is minimal or no sample preparation, with simple and very fast analysis. Most IR spectrometers are easy-to-use, portable and relatively inexpensive, making online sample analysis possible.
Until recently, mid-infrared spectroscopy was mainly used as a qualitative method to identify unknown pure substances by providing structural characterization based on functional groups vibration and a fingerprint spectrum (identification) or to verify quality markers in plant extracts or distillates [15]. Infrared spectra recorded from plant extracts are usually very complex, as plant extracts are multicomponent mixtures [16]. Since each functional group in a single molecule contributes to the spectral pattern, the band assignments may be difficult due to the complexity of the final spectrum as a result of peaks overlapping and vibrational mixing.
The application of chemometric tools is a very active research field for the classification and quality evaluation of food products [17]. Many studies have reported on the use of chemometric tools to classify plant foods. For example, principal component analysis (PCA) was used to outline of the similarities and differences among 16 algal species and to classify 40 wine samples based on their high-performance thin-layer chromatography (HPTLC) fingerprints [18,19]. It is well known that each plant species has a special, complex mixture of bioactive natural products, in which each component contributes to its overall bioactivity. Related botanical families often contain similar types of biologically active secondary metabolites, and an understanding of the systematic position of a medicinal plant species allows some assumptions to be made about compounds that are present. Plants from the Asteraceae and Lamiaceae family have been historically used to cure various diseases. This is because they produce a wide range of secondary metabolites with potent antibacterial, antioxidant, anti-inflammatory, antimicrobial, antiviral, and anticancer activities. The Lamiaceae family plant extracts have a higher content of phenolics (and flavonoids) than Asteraceae extracts and contain more known medicinal plants than the Asteraceae family [20,21]. The main aim of the current study was to investigate the effects of plant taxonomy or botanical origin and purity of essential oil based on their spectral fingerprint in the mid-infrared region. For that purpose, attenuated total reflection Fourier-transform infrared (ATR-FTIR) spectroscopy was combined with pattern recognition techniques such as PCA and cluster analysis.

Samples
Thirty essential oil samples were analyzed. Sixteen samples of concentrated essential oils ( Table 1) that were obtained by supercritical fluid extraction with natural carbon dioxide (se-CO 2 ) were kindly donated by FLAVEX ® Naturextrakte GmbH, and 14 remaining samples were commercial-grade essential oils purchased from the local market (lemongrass, two geranium oil samples, lavender, orange, two peppermint oil samples, forgive oil, breathe blend, frankincense, juniper, two rosemary samples, and bergamot essential oil).

ATR-FTIR-Spectroscopy
The FTIR spectra were acquired using a Cary 630 FTIR (Agilent Technologies Pty Ltd., Mulgrave, Australia) interfaced with an ATR (attenuated total reflectance) sampling accessory with a single bounce diamond crystal. Spectra, in the absorbance mode, were measured from 4000 cm −1 to 600 cm −1 , by accumulation of 64 scans at a spectral resolution of 4 cm −1 . A reference (background spectrum of air) was scanned under the same instrumental conditions before each sample measurement. Spectra were processed with Resolution Pro FTIR spectroscopy software (version 5.2.0, Agilent Technologies Pty Ltd., Mulgrave, Australia). A small drop of essential oil sample is simply placed on the surface of the diamond ATR crystal and the sample spectrum collected.
For the PCA analysis, the original 1858 spectral intensities were reduced into 254 averaged spectral values, each from five consecutive wavenumbers (dB = 5).

Principal Component Analysis
PCA analysis was performed with the Principal Component Analysis for Spectroscopy App for OriginPro ® . 2019 version 9.6.0.172 (OriginLab Corporation, Northampton, MA, USA). Hierarchical cluster analyses (HCA) were performed using the PLS Toolbox software package (Eigenvector Research, Inc., Manson, WA, USA) for MATLAB (Version 7.12.0 R2011a). HCA was performed using the Ward method to calculate Euclidean distance as a measure of distance between samples.

ATR-FTIR Spectrometry
Essential oils are concentrated solutions of volatile compounds, consisting of complex homogenous mixtures of various compounds, with more than 100 constituents present in each species (taxon).
Their FTIR spectra are complex due to the spectra of individual components overlapping and the mixing of various vibrational modes. Although contributions of major components never exceed 25% of the total content, those compounds in essential oils that occur at low concentrations (<1%) do not influence the ATR-IR spectrum significantly. Thus, ATR-FTIR spectra obtained from the essential oil samples present characteristic spectral fingerprints that can be used to discriminate different plant species and chemotypes.
ATR-FTIR absorption spectra of the concentrated essential oils show the expected characteristic C-H stretch (~2900 cm −1 ), C=O stretch (~1700 cm −1 ), broad O-H stretch (~3400 cm −1 ) and C-O stretch (~1100 cm −1 ) of terpenoid components present in the essential oils (Figure 1a,b). As expected, the FTIR spectra of these oils are dominated by vibrational modes from monoterpenes which are observed at 886, 1436 and 1644 cm −1 . Another useful band in terms of differential identification of the oils is the band for C=O stretching, which appears at 1740 cm −1 in lavender, rosemary and sage oil, and is shifted to lower wavenumbers in lemon myrtle, oregano (terpineol type) and peppermint oil [22,23].

ATR-FTIR Spectrometry
Essential oils are concentrated solutions of volatile compounds, consisting of complex homogenous mixtures of various compounds, with more than 100 constituents present in each species (taxon). Their FTIR spectra are complex due to the spectra of individual components overlapping and the mixing of various vibrational modes. Although contributions of major components never exceed 25% of the total content, those compounds in essential oils that occur at low concentrations (<1%) do not influence the ATR-IR spectrum significantly. Thus, ATR-FTIR spectra obtained from the essential oil samples present characteristic spectral fingerprints that can be used to discriminate different plant species and chemotypes.
ATR-FTIR absorption spectra of the concentrated essential oils show the expected characteristic C-H stretch (~2900 cm −1 ), C = O stretch (~1700 cm −1 ), broad O-H stretch (~3400 cm −1 ) and C-O stretch (~1100 cm −1 ) of terpenoid components present in the essential oils (Figure 1a,b). As expected, the FTIR spectra of these oils are dominated by vibrational modes from monoterpenes which are observed at 886, 1436 and 1644 cm −1 . Another useful band in terms of differential identification of the oils is the band for C = O stretching, which appears at 1740 cm −1 in lavender, rosemary and sage oil, and is shifted to lower wavenumbers in lemon myrtle, oregano (terpineol type) and peppermint oil [22,23].

Principal Components Analysis and Cluster Analysis
Principal components analysis is a method that is used to reduce a complex dataset into a smaller set of latent variables, known as principal components (PCs). With a large number of spectral bands (input variables), in order to interpret the data in a meaningful way, it is necessary to reduce the number of variables to a few linear combinations of the spectral bands that can be easily explained. The first two PCs captured most of the information and were sufficient to describe the essence of the

Principal Components Analysis and Cluster Analysis
Principal components analysis is a method that is used to reduce a complex dataset into a smaller set of latent variables, known as principal components (PCs). With a large number of spectral bands (input variables), in order to interpret the data in a meaningful way, it is necessary to reduce the number of variables to a few linear combinations of the spectral bands that can be easily explained. The first two PCs captured most of the information and were sufficient to describe the essence of the data in the PCA correlation matrix, as is evident by the steep line that bends quickly and then flattens out. The score plot in Figure 2a,b shows clustered samples with similar values of the input spectral data (i.e., principal components). Three principal components described 69.80% of the total variability; PC1 describes 38.2% and PC2 describes 23.7%, while PC3 describes 7.9% of the total variability.  Principal component analysis showed that plant materials originating from the Asteraceae and Lamiaceae families created separate clusters along PC1. Furthermore, based on PC2, Asteraceae samples (samples 4-7) were separated from commercial samples as well as from the Lamiaceae (samples 9-16).
A cluster analysis confirmed that essential oil samples extracted from plants belonging to different botanical families differ significantly in their contents and therefore have unique spectral fingerprints.
After filtering and normalization (Standard Normal Variate, SNV), the principal component analysis of 16 samples of concentrated essential oils from FLAVEX ® clustered samples resulted in a two-component model that explained 69.50% of the total variance. PC1 accounted for 42.7%, while PC2 accounted for 26.8% of the total variance (Figure 2c). Samples were grouped in the same way as before with all samples, into three groups. According to PC1, Lamiaceae samples are clearly separated from the other concentrated essential oils: the latter group sub-divided in two in PC2; one composed of four samples from the Asteraceae family (samples 4-7); and the other comprising frankincense (Burseraceae family), arnica (Asteraceae family) and primrose (Primulaceae family).
The first group included four samples from the Asteraceae plant family; the second group included only 9 concentrated oils from the Lamiaceae; while the third group included frankincense, arnica and primrose essential oils that belong to different plant families.
The PC loadings plot was used to identify IR bands that are characteristic for the sample groups and have the strongest influence on each group of samples and thus contribute to the unique FTIR profile in the group ( Table 2). The bands with the strongest influence on the principal components in the Lamiaceae family group essential oils are bands at~1375 and 1450 cm −1 , a broad band at 3400-3500 cm −1 , in addition to a band at 842 cm −1 that was shifted to 862 cm −1 in oregano essential oil (Figure 3a). The vibrational frequencies for linalool and linalool acetate fit correctly in the vinyl group vibration RHC=CH 2 at 1635-1650 cm −1 , but the intensities of these peaks are very low. Linalool is found in the essential oils of over 200 plant species, belonging to different families [24]. Linalool and its ester linalyl acetate are the main constituents in lavender oil. The C=CH 2 in-plane deformation vibration occurs near 1420 cm −1 . Only linalool, linalyl acetate and myrcene show this band, while limonene and β-pinene show no trace of it. However, this band may be hidden under the CH 2 and CH 3 deformation bands near 1450 cm −1 . The =CH 2 in plane deformation vibration was not found as a separate band near 1410 cm −1 , and was most probably hidden under the −CH 3 and =CH 2 absorption bands. The presence of a =CH 2 group may also explain why the 1330-1410 cm −1 intensity is higher for some terpenes [22]. =CH 2 in plane deformation at 1420 cm −1 (presence of =CH 2 group will increase intensity of the peaks from 1330-1410 cm −1 for some terpenes) [22]. Peak at~1450 cm −1 is a result of overlap of CH 2 deformation and asymmetrical CH 3 deformation (intensity of this peak is proportional to the number of CH 2 and CH 3 groups present) 3400-3500 cm −1 (broad band) Lamiaceae family plant extracts have a higher content of phenolics (and flavonoids).
842 cm −1 (shifted to 862 cm −1 in oregano) Weak skeletal vibration for isopropyl (R 1 R 2 C=CHR 3 out-of-plane deformation of non-strained, weakly strained (cyclohexene derivatives) and strongly strained systems); key characteristic peak for carvacrol occurs at 862 cm −1 1635-1650 cm −1 (low intensity peaks) Vibration for RHC=CH 2 (linalool and linalool acetate). The CH2 deformation and asymmetrical CH3 deformation appear at 1450 cm −1 , while the symmetrical CH3 deformation is near 1380 cm −1 . Although the two 1450 cm −1 vibrations actually occur at two different frequencies (1460 and 1440 cm −1 as average values), the two peaks usually overlap and only one peak is observed. The integrated intensity of this band can supply information about the number of CH2 and CH3 groups present. The strong band at 1745 cm −1 for rosemary essential oil, especially in the rosemary cineole type, suggests the presence of 1,8-cineole [12]. For sage essential oil, the carbonyl stretching band at 1745 cm −1 indicates the presence of α-thujone and camphor [25].
Oregano (phenol type) and thyme essential oils, although in the same group, were separated The CH 2 deformation and asymmetrical CH 3 deformation appear at 1450 cm −1 , while the symmetrical CH 3 deformation is near 1380 cm −1 . Although the two 1450 cm −1 vibrations actually occur at two different frequencies (1460 and 1440 cm −1 as average values), the two peaks usually overlap and only one peak is observed. The integrated intensity of this band can supply information about the number of CH 2 and CH 3 groups present. The strong band at 1745 cm −1 for rosemary essential oil, especially in the rosemary cineole type, suggests the presence of 1,8-cineole [12]. For sage essential oil, the carbonyl stretching band at 1745 cm −1 indicates the presence of α-thujone and camphor [25].
Oregano (phenol type) and thyme essential oils, although in the same group, were separated from the other Lamiaceae plant essential oils. The loading plot for oregano and thyme suggested the importance of the FTIR area around 1380 cm −1 . They also exhibit the same characteristic bands at 1458 and 1380 cm −1 , but the band at 1740 cm −1 was reduced in intensity. This is clearly seen when compared to the spectrum of lavender (Figure 3b). For oregano essential oil, the signature band for carvacrol is located at~810 cm −1 , indicating C-H out-of-plane bending [26,27]. Oregano (phenol type) essential oil contains 60-80% phenolic carvacrol according to the manufacturer's claim. The band near 1380 cm −1 has been assigned to the symmetrical CH 3 . Isopropyl groups produce double bends near 1370 and 1380 cm −1 (e.g., tetrahydrolinalool, tetrahydrogeraniol and their acetates). The gem dimethyl (R 1 R 2 C(CH 3 ) 2 ) group present in the in αand β-pinene also produces this doublet. αand β-pinene are two structural isomers, bicyclic monoterpenes, present in many essential oils.
The most important bands for the Asteraceae family cluster were at 726, 1465 and 1733 cm −1 , together with the area between 850 and 920 cm −1 . Chamomile, marigold and tagetes were very close together, while echinacea was apart from the group (Figure 2b), perhaps because the peak at 1700 cm −1 is smaller in intensity (Figure 3c). Chamomile has a double C = O band at 1711 and 1735 cm −1 . Methylene =CH 2 out-of-plane deformation is seen in the region around 890 cm −1 . β-pinene, however, absorbs at 875 cm −1 with an increased intensity that is typical for strained ring structures with an exocyclic =CH 2 group [22]. The intensity of this band in myrcene is high, due to the conjugation with the vinyl group, which shows enhanced intensities for the out-of-plane deformation frequencies. Myrcene is a typical example of an unsaturated acyclic hydrocarbon, and is detected in chamomile flower essential oil [28].
A loading plot identified significant bands at 1159 and 1743 cm −1 for the last cluster containing pure essential oils of arnica, primrose and frankincense (Figure 3d). The strong absorption at 1160 cm −1 occurs due to the presence of lipids and alcohol groups (stretching of C-O and bending of C-OH) [23]. Frankincense essential oil contains diterpene alcohols, primrose oil contains high content of polyunsaturated fatty acids, while arnica contains sesquiterpene lactones and triterpenediol esters (arnidiol/faradiol). Although alcohols show two strong bands, the band in the 1300-1450 cm −1 region is usually overlapped by CH 2 and CH 3 absorptions, while the second band, assigned to the C-O stretching vibration, is in the 1000-1150 cm −1 region. Primary alcohols like nerol and geraniol, due to the presence of a double bond and secondary alcohols, absorb about 50 cm −1 lower [22].

Discussion
The principal component analysis has clustered essential oil samples into three groups according to their similarity. Highly concentrated extracts were clustered according to the botanical family, while commercial essential oil blends were grouped together regardless of the botanical family they belong. In the first group, 4 samples of concentrated essential oils from the Asteraceae plant family were clustered together; the second group was made up with 9 samples of concentrated extracts from the Lamiaceae plant family, while the last group contained 17 samples (three remaining concentrated extracts and commercial essential oils) from different sources regardless of the claimed purity and plant source. Thus, all samples from the Lamiaceae (Mint) family and all samples from the Asteraceae (Sunflower) family were clustered together except for arnica essential oil (sample 2), that belongs to the Asteraceae family but was not clustered together with the rest of the oils from this family. Instead, arnica was clustered in the same group with concentrated frankincense and primrose oils. Arnica oil exhibits a similar ATR-FTIR spectrum to the primrose oil with a strong sharp C=O band at 1743 cm −1 and a medium wide band at 1159 cm −1 that are not present in frankincense essential oil.
Interestingly, commercial lavender, rosemary, frankincense and peppermint essential oil samples did not cluster with their corresponding concentrated extracts despite their similar spectral patterns. PCA of commercial blends could not be explained by 2D or 3D PC plots, indicating no correlation between two principal components characteristics. Commercial essential oils are usually produced by using around 20% of the named species of plant and adding in natural extractions from other essential oils to meet the standardized oil profile requirements. In contrast, the highly concentrated FLAVEX ® essential oils used in this study were of 70-90% purity.
Although same oil samples produce very similar spectral fingerprints, there is a significant difference in in the ATR-FTIR spectra from concentrated essential oils and commercial-grade essential oils. Dilution of an essential oil with a non-polar organic solvent not participating in hydrogen bonding leads to significant differences in terms of the shape and intensity of the respective bands, particularly in the region from 1000-1300 cm −1 . The peaks in the FTIR spectra of commercial oils samples are better separated with less overlapping IR bands. The area between 1100 and 1300 cm −1 corresponds to stretching vibrations of the C-O group, while the O-C-O band originating from primary alcohols appears in the region from 1100 to 1020 cm −1 .
For example, concentrated lavender oil sample shows a clearly defined shoulder peak for the peak at 1740 cm −1 (corresponding to the vibrations of the C=O group) on the lower wavenumber side, with a clear maximum at 1685 cm −1 [29], which can be associated with the formation of a hydrogen bond between C=O and -OH groups (Figure 4).  Thus, there is more similarity in spectra from the same botanical family than between spectra of the concentrated and diluted oil samples of the same origin.
To confirm the results obtained by the PCA, a hierarchical cluster analysis (HCA) of spectra was performed ( Figure 5). Unlike PCA, HCA considers all the data variability and shows the similarity or dissimilarity among each pair of samples. The goal of the clustering algorithm is to partition the objects into homogeneous groups, so the within-group similarities are large compared to similarities between the groups. The HCA results supported the results of PCA, in which concentrated essential oils samples extracted from the Lamiaceae family (samples [8][9][10][11][12][13][14][15][16] and from the Asteraceae family (samples 3-7) were close to each other according to the hierarchical clustering result.  Thus, there is more similarity in spectra from the same botanical family than between spectra of the concentrated and diluted oil samples of the same origin.
To confirm the results obtained by the PCA, a hierarchical cluster analysis (HCA) of spectra was performed ( Figure 5). Unlike PCA, HCA considers all the data variability and shows the similarity or dissimilarity among each pair of samples. The goal of the clustering algorithm is to partition the objects into homogeneous groups, so the within-group similarities are large compared to similarities between the groups. The HCA results supported the results of PCA, in which concentrated essential oils samples extracted from the Lamiaceae family (samples [8][9][10][11][12][13][14][15][16] and from the Asteraceae family (samples 3-7) were close to each other according to the hierarchical clustering result. performed ( Figure 5). Unlike PCA, HCA considers all the data variability and shows the similarity or dissimilarity among each pair of samples. The goal of the clustering algorithm is to partition the objects into homogeneous groups, so the within-group similarities are large compared to similarities between the groups. The HCA results supported the results of PCA, in which concentrated essential oils samples extracted from the Lamiaceae family (samples [8][9][10][11][12][13][14][15][16] and from the Asteraceae family (samples 3-7) were close to each other according to the hierarchical clustering result. Variance weighted distance between cluster centers 15 Figure 5. Hierarchical cluster analysis (HCA) of ATR-FTIR essential oil spectra.

Conclusions
The ATR-FTIR method with chemometric evaluation was successfully applied as an objective method for the qualitative discrimination of essential oils from different plant species. ATR-FTIR spectra obtained from the essential oils were used as spectral fingerprints, and PCA was applied to characteristic key spectral bands that can be used as marker bands to discriminate different plant botanical families. It was demonstrated that essential oils could be grouped based on their purity and taxonomy (botanical family) using ATR-FTIR spectra with PCA and cluster analysis.
ATR-FTIR spectroscopy offers a green, direct and cost-effective alternative analytical method for the quality control of essential oils. Vibrational spectroscopy combined with chemometric algorithms also provides an efficient and non-destructive chemotaxonomic classification of medicinal and aromatic essential oils. Furthermore, PCA analysis of IR spectra identifies characteristic key bands of the specific essential oil that allows for the discrimination of different essential oil profiles of individual oil plants among the same species (chemotypes).