Verification of Chromatographic Profile of Primary Essential Oil of Pinus sylvestris L. Combined with Chemometric Analysis

Chromatographic profiles of primary essential oils (EO) deliver valuable authentic information about composition and compound pattern. Primary EOs obtained from Pinus sylvestris L. (PS) from different global origins were analyzed using gas chromatography coupled to a flame ionization detector (GC-FID) and identified by GC hyphenated to mass spectrometer (GC-MS). A primary EO of PS was characterized by a distinct sesquiterpene pattern followed by a diterpene profile containing diterpenoids of the labdane, pimarane or abietane type. Based on their sesquiterpene compound patterns, primary EOs of PS were separated into their geographical origin using component analysis. Furthermore, differentiation of closely related pine EOs by partial least square discriminant analysis proved the existence of a primary EO of PS. The developed and validated PLS-DA model is suitable as a screening tool to assess the correct chemotaxonomic identification of a primary pine EOs as it classified all pine EOs correctly.

Pinus sylvestris L. (PS) is the most widely distributed pine due to its ability to adapt to various climatic conditions and to grow on different soil types [3,4]. Furthermore, this species shows extensive genetic variability and can mainly be categorized into 3-carene, α-pinene or isoabienol chemotypes [33]. Closely related pine trees are Pinus cembra L. (PC), Pinus mugo T URRA (PMu) or Pinus nigra J. F. A RNOLD (PN) (phylogenetic tree, Figure S1). Due to their morphological similarity, congeners are easily mixed up with PS [5,17,18,34]. The profile of industrial used pine EOs is defined by the

Chromatographic Profile
The chromatographic profile of a primary EO of PS is shown in Figure 1. The chemical composition of the EOs could be classified into monoterpene hydrocarbons, oxygenated monoterpenes, sesquiterpene hydrocarbons, oxygenated sesquiterpenes and oxygenated diterpenes ( Table 1, Table S1). The performance of the analytical method was confirmed in an interlaboratory comparison. All analyzed primary EOs of PS (n = 36) showed a similar monoterpenoid pattern. As the main compounds, α-pinene (n = 19), 3-carene (n = 14), β-phellandrene (n = 2) or β-pinene (n = 1) were verified. The analyzed primary EOs of PS were categorized into 3-carene-rich (>5%) and 3-carene-low EOs, as proposed earlier ( Figure 2) [47]. Obviously, terpinolene was detected in 3-carene-and sabinene-rich EOs (4 and 5; 3.3% and 8.3%), respectively, which is in line with previous published data [48]. The ratio of 3-carene to terpinolene was found to be 15:1. Interestingly, the monoterpene pattern of the Danish EOs was completely different. Three out of five of the Danish EOs did not contain α-pinene or 3-carene as main compound. The main compounds were either β-pinene or β-phellandrene, respectively.
Additionally, the primary EOs of PS contained a typical diterpenoid profile (Figure 3), whose compounds were identified by comparing the mass spectra with the libraries and data in the literature ( Figure S2) [22,[49][50][51][52][53][54][55]. The identified diterpenoids belong to the labdane, abietane or pimarane group. The diterpenoid profile among the primary EOs was similar, although the abundance of the diterpenoids is also influenced by different environmental factors, genetic conditions and chemical reactivity [33]. The most intense peak of the diterpene area of 1, 6, 18, 21-23, 25, 26, 31 and 33-36 was identified as isoabienol, which belongs to the abienol group. Isoabienol was mainly found in the needles, whereas it was hardly detected in twigs ( Figure S3). The mass spectra of the abienols are characterized by similar fragmentation patterns, which make the identification of the abienols challenging. Nevertheless, the structure of isoabienol was determined by comparing the obtained mass spectrum with the one reported by Adams et al. and was identified by its characteristic base peak at 191 arising from loss of water (H 2 O) with an additional loss of the side chain (C 6 H 9 ) [52]. Besides isoabienol, further diterpenoids were present in the primary EOs of PS and were identified as cis-abienol from the labdane class, sandaracopimaral and isopimaral from the pimarane class and palustral from the abietane class, all exhibiting high spectral similarity. Interestingly, cis-abienol was detected when isoabienol occurs in a high amount. One may speculate that isoabienol can isomerize into cis-abienol. Sandaracopimaral, palustral and isopimaral were present in all analyzed EOs. However, these analytes were predominantly detected in twigs.  Additionally, the primary EOs of PS contained a typical diterpenoid profile (Figure 3), whose compounds were identified by comparing the mass spectra with the libraries and data in the literature ( Figure S2) [22,[49][50][51][52][53][54][55]. The identified diterpenoids belong to the labdane, abietane or   Additionally, the primary EOs of PS contained a typical diterpenoid profile (Figure 3), whose compounds were identified by comparing the mass spectra with the libraries and data in the literature ( Figure S2) [22,[49][50][51][52][53][54][55]. The identified diterpenoids belong to the labdane, abietane or pimarane group. The diterpenoid profile among the primary EOs was similar, although the  as cis-abienol from the labdane class, sandaracopimaral and isopimaral from the pimarane class and palustral from the abietane class, all exhibiting high spectral similarity. Interestingly, cis-abienol was detected when isoabienol occurs in a high amount. One may speculate that isoabienol can isomerize into cis-abienol. Sandaracopimaral, palustral and isopimaral were present in all analyzed EOs. However, these analytes were predominantly detected in twigs.

Geographical Origin
Despite the similarity of the terpenoid profile among the investigated EOs of PS, a separation into the geographical origins was feasible. To visualize the differences in the collected EOs in terms of origin, a principal component analysis (PCA) on normalized data was performed. PCA is a wellknown method for exploratory data analysis, which projects the original data onto a lower dimensional space of orthogonal components (principal components (PCs)), so that the first one explains the largest variance, the second one explains the largest variance, and so on [36,56,57]. In our case, the first three principal components (PC1, PC2 and PC3) explained 37.6%, 23.6% and 10.1% of the total variance, respectively, allowing the visualization of more than 70% of the information contained within the dataset in three dimensions (3D) ( Figure 4). The corresponding loading plots are found in Supplementary Figure S4. Some of the EOs were well separated based on the first three PCs in terms of their origin, i.e., EOs from Denmark, Sweden and Russia. The EO of PS showed a great variability in their sesquiterpene content and pattern obtained from different geographical locations, whereas the Swiss and German EOs were not separated due to their closeness of collection locations. EOs from Russia were characterized by relatively higher values of oxygenated sesquiterpenes, while EOs from Switzerland and Germany exhibited higher value of sesquiterpenes hydrocarbons and the Danish EOs of guaia-6,9-diene. PCA verified that the geographical location influences the chemical composition of the second metabolites and has to be considered for the quality control which is in line with previous reports [37,38].

Geographical Origin
Despite the similarity of the terpenoid profile among the investigated EOs of PS, a separation into the geographical origins was feasible. To visualize the differences in the collected EOs in terms of origin, a principal component analysis (PCA) on normalized data was performed. PCA is a well-known method for exploratory data analysis, which projects the original data onto a lower dimensional space of orthogonal components (principal components (PCs)), so that the first one explains the largest variance, the second one explains the largest variance, and so on [36,56,57]. In our case, the first three principal components (PC1, PC2 and PC3) explained 37.6%, 23.6% and 10.1% of the total variance, respectively, allowing the visualization of more than 70% of the information contained within the dataset in three dimensions (3D) (Figure 4). The corresponding loading plots are found in Supplementary Figure S4. Some of the EOs were well separated based on the first three PCs in terms of their origin, i.e., EOs from Denmark, Sweden and Russia. The EO of PS showed a great variability in their sesquiterpene content and pattern obtained from different geographical locations, whereas the Swiss and German EOs were not separated due to their closeness of collection locations. EOs from Russia were characterized by relatively higher values of oxygenated sesquiterpenes, while EOs from Switzerland and Germany exhibited higher value of sesquiterpenes hydrocarbons and the Danish EOs of guaia-6,9-diene. PCA verified that the geographical location influences the chemical composition of the second metabolites and has to be considered for the quality control which is in line with previous reports [37,38].

Classification Model
To discriminate between EOs of PS and closely related pine trees, EOs of PC, PMu and PN were analyzed by chromatographic and chemometric profiling. The chemical composition of these pine EOs were presented in Table 1 (37, 44, 53) and Supplementary Table S2. As shown in Supplementary Materials Figure S5, PCA was not able to discriminate between different congeners, nor after the elimination of the outliers 3, 4 and 5. Thus, a PLS-DA model was calibrated to distinguish five classes of pine EOs. The EOs 3, 4, and 5 were classified as chemotypes of PS characterized by a high amount of β-phellandrene and considered as one class (PS II). PLS-DA is a multivariate classification method, which is based on a PLS regression algorithm and aims to find linear combinations of the original variables (latent variables (LV)) that better separate each class [59,60]. In our study, before PLS-DA calculation, the fourth root for data preprocessing was applied, which had previously shown efficiency discriminating seized cannabis samples obtained by GC-MS using the fourth root [61]. After auto-scaling, the number of latent variables was optimized with three-fold cross validation, with five latent variables (LV) maximizing the Non-Error Rate (NER) [62]. The model was further validated with bootstrap and random resampling protocols. The obtained model resulted stable, with a NER ranging from 89% and 93% for bootstrap and random resampling, respectively ( Table 2). The model sensitivity, which represents the percentage of correctly classified compounds for each class, was always greater than or equal to 90% with the exception of PMu, whose sensitivity values were nonetheless equal to or higher than 75% (Table 2) [62]. The comprehensive classification performances are presented in Tables S3 and S4. As it can be seen from selected score plots ( Figure 5), the EOs of PS I (blue) were clearly separated from PS II (violet), PC (brown), PMu (red) and PN (yellow) by the second and third latent variables (LV2 and LV3), whereas the score plot based on the first to the fourth latent variables (LV1 and LV4) additionally separated PC (brown) from PN (yellow). The corresponding loading plots are presented in Figure S6. Table 2. Classification parameters of the PLS-DA model in cross-validation, bootstrap and random resampling [61]. Sensitivity (Sn) and specificity (Sp) for each class, along with non-error rate (NER) and ratio of non-assigned compounds (n.a.) are reported.  To reinforce the statement of a proper primary EOs of PS I compounds with a regression coefficient > |0.05| for classification of PS I were considered ( Figure S7). Among these compounds data of γ-cadinene were normally distributed and equal variances were assumed. The compound identified as γ-cadinene showed significant to highly significant difference of the mean value of PS I to closely related pine EOs and might serve as potential chemical marker for the classification of PS I ( Figure 6). The result of the current study finally confirms suggestions made by our group after a preliminary study in 2016 (data not shown). The PLS-DA model was able to separate PS (PS I/PS II) from the closely related pine EOs. This supports the existence of a proper primary EO of PS (even EOs of the chemotype PS II was separated). Furthermore, the developed model predicted the EOs of the test set in their corresponding taxonomic class (Table S5). The model can be used as a screening method to classify the EOs into their taxonomic specification. The classification of the EOs is crucial to ensure the quality and authenticity of the EOs and to avoid the possibility of confusion.
To reinforce the statement of a proper primary EOs of PS I compounds with a regression coefficient > |0.05| for classification of PS I were considered ( Figure S7). Among these compounds data of γ-cadinene were normally distributed and equal variances were assumed. The compound identified as γ-cadinene showed significant to highly significant difference of the mean value of PS I to closely related pine EOs and might serve as potential chemical marker for the classification of PS I ( Figure 6). The result of the current study finally confirms suggestions made by our group after a preliminary study in 2016 (data not shown).
identified as γ-cadinene showed significant to highly significant difference of the mean value of PS I to closely related pine EOs and might serve as potential chemical marker for the classification of PS I ( Figure 6). The result of the current study finally confirms suggestions made by our group after a preliminary study in 2016 (data not shown).

Primary EO of PS, PC, PMu and PN
Needles and twigs were obtained from PS (n = 36), PC (n = 7), PMu (n = 9) and PN (n = 6). A detailed overview of the used in-house codes, GPS coordinates and harvesting times can be found in the Supplementary Table S5. The EO of fresh cut (pieces of 1 cm) needles and twigs was obtained by industrial distillation. Subsequently, the EOs diluted in heptane and analyzed by GC-FID and GC-MS.

GC-FID Analysis for Chromatographic Fingerprint
GC-FID analysis was performed using a Thermo Fisher Scientific Trace Ultra gas chromatograph (Thermo Fisher Scientific, Waltham, Massachusetts, USA) equipped with a DB-wax capillary column (30 m × 0.25 mm i.d., film thickness 0.25 µm, Agilent, Santa Clara, USA). The temperature of the injection was 220 • C. The injection volume was 1 µL (autosampler AI3000, Thermo Fisher Scientific) using a split ratio of 1:50 with a split flow of 75 mL min −1 . Helium was used as carrier gas at a constant flow rate of 1.5 mL min −1 . The oven temperature was kept at 65 • C for 10 min and then heated to 220 • C with 5 • C min −1 and kept constant at 220 • C for 9 min. The temperature of the detector was 250 • C. The chromatographic profile was analyzed using the relative percentages of the individual components based on the FID response (peak area). The data were acquired with Chrom Card Trace Focus GC (Thermo Fisher scientific, version 2.9). Interlaboratory comparison was carried out with Systema Natura GmbH (Flintbek, Germany) using the same GC-FID method for the analysis for the chromatographic profile of randomly selected EOs (n = 8).

GC-MS Analysis for Chromatographic Profile
The chromatographic conditions from GC-FID were adopted to GC-MS analysis. The GC analysis was performed using a Thermo Fisher Scientific Trace Ultra gas chromatograph equipped with a BGB-wax capillary column (30 m × 0.25 mm i.d., film thickness 0.25 µm, Restek, Bellefonte, PA, USA) fitted with a guard column (1 m × 0.25 mm i.d, deactivated, Restek). The temperature of the PTV injection was 220 • C. The injection volume was 1 µl (TriPlus autosampler, Thermo Fisher scientific) using a split ratio of 1:50. Helium was used as carrier gas at a constant flow rate of 1.5 mL min −1 . The oven temperature was kept at 65 • C for 10 min and then heated to 220 • C with 5 • C min −1 and kept constant at 220 • C for 9 min. The MS analysis was carried out on a Thermo DSQ II mass spectrometer detector operated in positive EI mode at 70 eV. Transfer line and ion source temperatures were set to 250 • C. Mass spectra were acquired in the full scan mode (mass range 40-300 m/z). Peak identification was performed using different libraries: NIST (version 2.2, 2014), Adams (fourth edition, 2007) and in-house libraries [58,63]. Retention indices (RI) were calculated according to the van den Dool and Kratz equation [64]. The used software was Thermo Xcalibur (Thermo Fisher scientific, version 2.2 SP1.48).
PCA was performed on auto-scaled data of the chemical composition (sesquiterpenes) of primary EOs of PS (n = 36).
PLS-DA was performed on fourth root calculated and subsequent auto-scaled data. The dataset was composed of 35 EOs (PS I (n = 17), PS II (n = 3), PC (n = 6), PMu (n = 5) and PN (n = 4)) characterized by 39 compounds (Table S5). The threshold value for the separation of the classes was estimated using Bayes' Theorem. Three-fold cross-validation was performed using venetian blinds splitting protocol and used to select the optimal number of latent variables based on Non-Error Rate [61]. Additional validation was performed using bootstrap and random resampling. The PLS-DA settings for all types of validations were reported in the Table S6. The developed PLS-DA model was applied to predict PS (n = 16), PC (n = 1), PMu (n = 4) and PN (n = 2) EOs (test set).
To determine potential chemical markers data (fourth root of chemical composition) were compared by using an ordinary one-way ANOVA followed by Tukey post-hoc test. Prior to ANOVA, normal distribution using Shapiro-Wilk test (α = 0.05) and homoscedasticity using Brown-Forsythe test (p < 0.05) were asserted.

Conclusions
A primary EO of PS was characterized by its chromatographic profile with a distinct sesquiterpene pattern followed by a diterpene area containing diterpenoids of labdane, pimarane or abietane type. Chemometric methods in combination with chromatographic profiling like PCA and PLS-DA were successfully applied to assign EOs of PS into their geographical origin and to differentiate closely related pine EOs. PLS-DA was established as a powerful screening tool in routine analysis and identification of EOs from PS.

Supplementary Materials:
The following are available online: Table S1: Chemical composition (%, percentages of the total EO composition) of the primary EOs of PS; Table S2: Chemical composition (%, percentages of the total EO composition) of closely related pine EOs; Table S3: Classification parameters of the PLS-DA model in fitting, cross-validation, bootstrap and random resampling. Sensitivity (Sn), specificity (Sp) and precision (P) for each class are reported; Table S4: Classification parameters of the PLS-DA model in fitting, cross-validation, bootstrap and random resampling. Sensitivity (Sn), specificity (Sp) and precision (P) for each class are reported; Table S5: Origin data of the primary EOs and their classification in PLS-DA analysis (EOs for PLS-DA development in bold, EOs for test set in italic); Table S6: PLS-DA settings of the used types of validations; Figure S1: Phylogeny of the genus Pinus. Pinus is divided into the subgenus Pinus and Strobus. Only the used species in this study are mentioned; Figure S2: Fragmentation pattern of the diterpenoids; Figure S3: The diterpenoid profile obtained from (a) needles and (b) twigs; Figure S4: The loading plot of principal components PC1 versus PC2 for EOs of PS based on the sesquiterpenes; Figure S5: (a) and (b) The score plot of PC1 and PC2 with and without outliers; Figure  S6: (a) and (b) The loading plot of LV2 to LV3 and LV1 to LV4, respectively; Figure S7: Regression coefficients for the EOs of PS I.