1. Introduction
Adulteration of a pure substance occurs when it is intentionally altered by the addition of foreign materials or exposed to an environment that would induce a change [
1]. In the context of the essential oil industry, adulteration can be used as a means to increase product yields, decrease production costs and/or enhance the perceived quality of the final product, with increased profit being the principal incentive [
2,
3]. Underlying this is the low manufacturing yields in comparison to the costs of high quality oil production and the sourcing of raw materials. There is also a lack of regulation regarding the classification and labelling of essential oils, which is highlighted by the difficult challenge of identifying adulterated finished products. Compounding the issue even further, there is some inherent natural variation within oils as a result of the environmental and geological conditions during plant growth and harvesting [
4,
5,
6,
7,
8,
9].
For the essential oil of lavender, adulteration has been widely observed. The most common form of adulteration is where oil from the English lavender plant (
L. angustifolia Miller) is adulterated with the essential oil from the much cheaper sterile hybrid, lavandin (
Lavandula × 
Intermedia) [
10]. In comparison to lavender, lavandin essential oil does have a similar scent but contains a greater level of terpenes (mostly camphor) that gives the oil a sharper overtone. The difference between the two oils is easily detectable by scent, however, when blended together, the sharper tones of lavandin are diluted, thus, making it difficult to detect by scent alone. At approximately 
$38 AUD/kg for lavandin essential oil compared to 
$251 AUD/kg for lavender essential oil, the incentive to adulterate oil is high considering the lack of regulation and the low probability of the adulterant being identified [
1]. Another form of adulteration amongst essential oils is referred to as environmental adulteration, where oils from a lesser-quality yielding geographical location is labelled as being derived from a higher-quality/valued geographical location. For example, essential oils harvested from some European countries are typically sold at a premium compared to oils harvested from Asian nations, not only because of reduced labor costs in Asia but because of the increased oil quality due to environmental and geographical factors [
9]. Of note, Hassiotis et al. [
11] concluded that within a geographical location, essential oil quality can also varying due to habitat and diurnal changes at the site of production. The time of day plants are harvested [
12], the crop duration and number of multiple harvests [
13], and postharvest storage conditions [
14] can also result in reduced oil quality. However, skilled blenders of oils are capable of making blends that are almost indistinguishable from the pure oil variety using conventional laboratory testing techniques such as Gas Chromatography (GC). With more sophisticated testing, it is possible to distinguish the pure from the adulterated oils and also identify ‘
multiple geographically sourced’ blended oils but this requires capabilities not often available to most testing laboratories [
15].
Modern methods routinely used for determining the composition and quality of essential oils include GC, high performance liquid chromatography (HPLC), Mass Spectrometry (MS) and Nuclear Magnetic Resonance (NMR) spectroscopy [
2]. Chromatographic techniques such as GC and HPLC are used to separate essential oils into their individual constituents so that they can be identified and potentially quantified by a coupled detector such as a MS. The GC technique lends itself to the analysis of essential oils, as it is ideal for the analysis of volatile organic compounds. When used in conjunction with MS and NMR spectroscopy, GC has revolutionized the detection of minor chemical constituents within essential oils. MS looks at the fragmentation patterns of compounds under ionizing conditions, and this information is used to deduce their structures. NMR elucidates the structures of molecules by examining the environment of specific atoms such as 
1hydrogen and 
13carbon within a molecule. The sensitivity of analytical techniques for organic compounds has increased dramatically over recent years to the point where even trace constituents, including pollutants such as pesticides, can be detected [
2]. Furthermore, the development of chiral GC-MS techniques has been found to be a useful approach for the authentication of essential oils [
1,
16]. The use of a chiral column in GC enables the analyst to separate enantiomers from one another and determine their unique ratios. The ratio of enantiomers within an essential oil is indicative of its biological origin and thus can provide strong evidence of any adulteration. Chiral GC-MS has been shown to detect lavender oil adulterated with synthetic linalool and linalyl acetate [
17,
18,
19], lavandin oil [
16,
20] and grapefruit oil [
21]. Analysis of the minor organic components within the oil can also be used to distinguish between geographical locations or harvesting season [
9].
ISO standard 11024 [
22,
23] details the GC protocol for obtaining chromatographic profiles of essential oils, detailing the compounds and representative characteristics that can be used to assess oil quality. This requires an authentic reference standard to which unknown oils are assessed against, after chromatographic integration and peak alignment. The approach outlined in the standard requires the use of a skilled analytical chemist, and the integration and comparison between samples can be a time consuming process if multiple samples from multiple batches are to be analyzed. One approach to expedite the screening of oils and make them available for sale faster is through the use of chemometric data analysis techniques. Chemometrics has been applied to MS acquired essential oil data to monitor mixtures of oils in foods, cosmetics or pharmaceuticals, assess quality and authentication [
24]. The application of chemometrics in this context relies on using chromatographic features to develop predicative class models that can rapidly assess GC-MS derived data and categories samples based on quality.
As such, the study herein describes the utilization of a simple GC-MS method followed by a chemometric analysis for the identification and characterization of two different grades of lavender oil. The two oil variants, referred to as ‘
Essential oil’ and ‘
Garden Oil’, were sourced from an essential oil distributor who is seeking a fast screening method that is able to reliability qualify oils as either high quality or low grade. The current approach used by the distributor is to analyze new sample batches for the seven target compounds of interest as identified in the ISO standard, namely: 1,8-cineole, 
cis-β-ocimene, linalool, 1-octen-3-yl acetate, camphor, linalyl acetate and lavandulyl acetate. These target compounds are common features in lavender essential oils [
1]. While this approach serves its purpose, there have been incidences of miss-graded oils where higher quality oils have been graded as low grade. Such misclassifications can have a negative impact on financial metrics. Therefore, GC-MS derived data from a targeted and untargeted analysis were used to develop two predicative models. The models were assessed for their suitability to predict and classify new samples, validated using 9 additional unknown samples which were subsequently characterized.
  2. Results and Discussion
An Australian lavender oil distributor provided a reference essential oil standard (natural L. angustifolia L. oil, CAS Number 8000-28-0) for analysis. This reference standard is used by the distributor to characterize oils received from France, Bulgaria, Indonesia and Cambodia. It is important to note that all the oils used for the experiments detailed herein were derived from L. angustifolia. In addition, the variation of oil composition has been reported to be within industry acceptable limits for the identification of lavender essential oil. Due to this, the distributer supplies the oils originating from France and Bulgaria as either certified essential oils and are distributed as high quality oil, having greater olfactory notes. The oils that originate from Indonesia and Cambodia are of a ‘perceived lower quality’. Both claim to be derived from the high quality yielding lavender species of L. angustifolia but there is a distinct difference in the quality of the scents produced. The exact cause for the lower quality oil is not known, though it is most likely due to its geographical region and/or adulteration. All that is known is that from the perspective of the distributor, the lower grade lavender oil is considered as an ‘adulterated oil type’ due to its perceived scent quality. Whatever the cause for the drop in quality be it geographical, adulteration or a combination of both, by industry standards they are considered adulterated and will be used as a positive control for such throughout this study.
  2.1. Targeted Analysis
The ISO Standard 11024 is a targeted analysis that focuses on the percentage abundance of seven target components within a sample of lavender essential oil sample. The target components are 1, 8-cineole, cis-β-ocimene, linalool, 1-octen-3-yl acetate, camphor, linalyl acetate and lavandulyl acetate. This section investigates the limits of applying such a targeted simplistic approach to a highly complex biological sample that is subject to high levels of natural variation.
  2.1.1. Lavender Reference Standard Assessment Using the ISO Standard 11024
The lavender reference standard was analyzed for the presence of 1,8-cineole, 
cis-β-ocimene, linalool, 1-octen-3-yl acetate, camphor, linalyl acetate and lavandulyl acetate. Analytical conditions were optimized in order to acquire reproducible and valid data for the analysis and subsequent predictive modelling. Alkane internal standards (C
8–C
22) were used during the analyses for the determination of linear retention indices (LRI) and assay performance. Relative standard deviations (%RSDs) were calculated to be between 3.6–13.3% for the 13 individual alkane standards, with a mean %RSD of 5.7 ± 2.5 (
n = 7). The target compounds of 1,8-cineole, 
cis-β-ocimene, linalool, 1-octen-3-yl acetate, camphor, linalyl acetate and lavandulyl acetate were found to account for ca. 74.5% (1.8% RSD) of each oil analyzed in terms of peak area. Furthermore, 
Table 1 illustrates that the relative percentage peak area of each target component meet the requirements outlined in the ISO standard 11024 [
22,
23]. 
Table 1 also details the retention time and calculated LRI of the target compounds.
  2.1.2. Comparison of Lavender Oil Samples Using the Targeted ISO Standard 11024 Test Method
Analysis of 30 high quality and 27 lower grade oils as per ISO Standard 11024 indicated that the seven target compounds used to characterize the lavender oil samples were all present (
Figure 1). Statistical analysis of these compounds and their relative composition in the high quality and lower grade oils concluded that these oils are significantly different (
Table 2), albeit within acceptable compositional ranges as per the ISO Standard 11024. The ratio of 
t-statistic and t-critical values (
t-stat: 
t-crit.) in two sample 
t-Test analysis and the ratio of 
F-value and 
F-critical value (
F: 
F-crit.) in a single factor ANOVA analysis indicate that the mean values of the target compounds within the two varieties of lavender oil were significantly different from each other thus indicating the varying quality in the two oil cohorts (
Table 2). However, as described in the following section, the utilization and application of the target compounds alone in a predicative model was found to be unreliable.
As illustrated in 
Table 2, the %RSD of the major compounds, linalyl acetate and linalol, were 4 and 4–6% for higher and lower graded oils, respectively. The %RSD for other compounds, which are reportedly found at lower concentrations varied significantly with a range of 14–64%. However, it is noteworthy to mention that both oil classes were within acceptable percentage composition ranges that are classified as lavender essential oils.
Using ISO Standard 11024, all lavender samples tested were successfully identified as being lavender essential oil. However, the ISO Standard 11024 cannot reliably distinguish between the two classes which as pointed out earlier in the introduction section of this report, which is of great monetary importance to the industry. The specification limits applied in the ISO Standard 11024 for the percentage abundance of the targeted compounds are not stringent enough to differentiate between the two classes. A simple solution would be to recommend that tighter specifications be set for the targeted compounds in the ISO Standard 11024 test method but this would be a great over simplification. Applying such stringent specifications to only a small number of target compounds within such a variable and complex matrix will be problematic for the industry to administer as the probability for miss identifications will be significant and costly.
  2.2. Assessment of the Targeted ISO Standard 11024 Test Method Using a Predictive Model
The data acquired from the targeted ISO Standard 11024 GC-MS analysis of the essential oil samples was further analyzed using multivariate discrimination techniques, such as principle component analysis (PCA) and partial least squares-discrimination analysis (PLS-DA). This expands on the simplistic approach of individual percentage abundance specifications for each targeted compound and focuses on correlations between the percentage abundance of the targeted compounds.
As illustrated in 
Figure 2, all the high-quality oil samples analyzed (represented by the blue circles) were predominately distributed in the left hemisphere of the PCA scatter plot (
Figure 2A). Conversely, the lower grade oil samples (represented by the red circles) were present predominately in the right hemisphere of the PCA scatter plot, with a few samples positioned in the left hemisphere. This indicates reasonable separation between the two sample cohorts. The 
R2X, and 
Q2 values obtained for the PCA analysis were observed to be 0.825 and 0.580, respectively, thereby providing a weaker predictive model. This is representative of a model that fits the data well but has weak-to-fair predictive capabilities (
Q2 ~ 0.6). As illustrated by the 
t-stat.: 
t-crit. values in the two sample 
t-test analysis and the 
F:
F-crit. values in the single factor ANOVA analysis values in 
Table 2, the variation between samples within each cohort is significant and statistically the two cohorts are not classed as the same. The PCA analysis could not distinguish between the two classes with reliable predictability possibly due to the limited focus on the seven targeted compounds. This shows that the simple application of more stringent specifications to the percentage abundance of the seven targeted compounds to distinguish between oil classes will be problematic. To develop a predictive model with a stronger predictive capability (
Q2), the analysis needs to be broadened beyond the seven targeted compounds and include all identified compounds (170) found in the GCMS analysis of the lavender essential oil samples.
  2.3. Development of a Non-Targeted Predictive Model
A partial least square discriminate analysis (PLS-DA) is applied to the untargeted GC-MS data set which identified 170 possible compounds. The subsequent PLS-DA Score and Scatter plots were generated using the untargeted data and are presented in 
Figure 3A,B, respectively. A Distance of Observation (DModX) analysis was used to identify and eliminate any outliers that may be present. DModX is the normalized observational distance between a variable set and X modal plane and is proportional to the variable residual standard deviation. ‘DCrit (critical value of DModX)’, derived from the F-distribution calculates the size of the observational area under analysis. As illustrated in 
Figure 3C, the DModX plot of the PLS-DA data indicated that no sample exceeded the threshold for rejecting a sample. Where the threshold for a moderate outlier to be rejected is identified when the sample DModX value is twice the DCrit at 0.05, which in this instance was 2.452 (DCrit = 1.226).
The objective of the PLS-DA analysis was to increase the predictive capability of the model, in addition to identify the peaks that provide the greatest differentiation between the high quality and lower grade oil samples. As such, the R
2X, R
2Y and 
Q2 values for the PLS-DA model were 0.903, 0.990 and 0.970, respectively, thereby providing greater predictability than the model created using targeted data. Plotting a volcano plot of the −log
10 p-value for each peak detected against the log
2 Fold Change (FC) displays the GC-MS data in way that clearly distinguishes the two cohorts and identifies the features that explain the differentiation between the sample cohorts. 
Figure 4 presents the volcano plot of the GC-MS data, and 
Figure 5 presents the top 15 compounds identified as a result of the volcano plot. It is noted that these compounds could not be identified using the Adam’s essential oils reference library with any real confidence (library match ≥70%) and additional work is needed to identify and characterize them in terms of lavender quality. However, for the work presented in this paper, their identification is not necessary as it will have no bearing on the effectiveness of the developed PLS-DA model to predict which cohort an unknown sample belongs to.
  2.4. Characterisation of Unknown Samples
In order to evaluate and assess the predicative capability of the developed PLS-DA model, 9 unknown samples were analyzed according to the method described herein and applied to the PLS-DA model. As illustrated in 
Table 3, the three unknown samples either strongly correlated with the high quality or lower grade oil cohorts. Correlation for the high quality oils are observed when the YPredPS (Y Predicted list for PLS-DA model) value is nearest to 1, where 1 is defined as being equal to the cohort reference material. Considering the oils sampled are natural, some variation is to be expected. As evident in 
Table 3, six of the unknown samples were characterized as belonging to the high quality oil cohort while three unknowns were characterized as belonging to the lower grade oil cohort. Furthermore, a misclassification analysis of the entire dataset indicates that 100% of the data was successfully classified, with a Fisher probability test of 7.1 × 10
−16. It is important to note, the misclassification analysis includes the 30 high quality and 24 lower grade oil samples, in addition to the unknown samples. A summary of the misclassification analysis is presented in 
Table 4.
  4. Conclusions
The current standard test methods such as the ISO Standard 11024 for the identification of lavender oil cannot be expanded to reliably determine the quality classes of lavender oil. The reason for this is the application of limits to the percentage abundance of only seven (out of 170) identified compounds. This proved to be a drastic over simplification to differentiate between subtle differences in a very complex sample type. A better approach would be to expand the analysis to include all 170 identified compounds within the lavender oils and analyze the correlations between their percentage abundance.
The chemometric model developed utilized the 170 compounds and identified 15 unique compounds that greatly differentiate between the two classes of lavender oil but also displayed little inter-variation between samples of the same cohort. This enabled the rapid characterization of the oils between these two varieties to be undertaken using a PLS-DA predictive model without the issues of sample-to-sample variation that comes with biological samples, and as evident in the target analysis.
When an additional 9 unknown/unclassified lavender samples were subsequently analyzed and the chromatograms run against the predictive chemometric model, all 9 uncharacterized oils were correctly identified. A misclassification analysis was performed against the samples as well as the 54 samples used to train the model and it was found that the predictions were sound. This study has successfully created a predictive test for the rapid quality assessment of lavender essential oil by using a combination of GC-MS profiling and a chemometrics predictive modelling.