The Application of Chemometrics to Volatile Compound Analysis for the Recognition of Specific Markers for Cultivar Differentiation of Greek Virgin Olive Oil Samples

In the present study, volatile compound analysis of olive oil samples belonging to ten Greek cultivars was carried out. A total of 167 olive oil samples collected from two consecutive harvest years were analyzed by Head Space-Solid Phase Microextraction-Gas Chromatography/Mass Spectrometry (HS-SPME-GC/MS). Volatile compound data were combined with chemometric methods (Multivariate Analysis of Variance (MANOVA) and Linear Discriminant Analysis (LDA)) with the aim not only to differentiate olive oils but also to identify characteristic volatile compounds that would enable differentiation of botanical origin (marker compounds). The application of Stepwise LDA (SLDA) effectively reduced the large number of statistically significant volatile compounds involved in the differentiation process, and thus, led to a set of parameters, the majority of which belong to compounds that are highly dependent on variety. In addition, the use of these marker compounds resulted in an increased correct classification rate (85.6%) using the cross-validation method indicating the validity of the model developed despite the use of a large number of dependent variables (cultivars).


Introduction
Olive tree cultivation, one of the oldest and most important agricultural activities, has led to the diversification of olives into a large number of cultivars. The olive fruit cultivar is a major determinant of olive oil quality due to differences in specific aroma and taste, phenolic content, etc. leading to a variety of olive oils, each with unique flavor characteristics and stability. In Greece more than 30 cultivars of olives exist, and most are characteristic of cultivation area (i.e., Galano cultivar from Metagitsi of Halkidiki, Topiki Makris cultivar from Evros, Samothrakis cultivar from Samothraki island, etc.) [1,2].
According to the International Food Authenticity Assurance Organization (IFAAO) [3], "food authenticity is the process of irrefutably proving that a food or food ingredient is in its original, genuine, verifiable and intended form as declared and represented". Food authentication is the concern of (i) regulatory authorities to avoid food adulteration (ii) food processors that do not wish to be subjected to unfair competition from unscrupulous processors who would gain an economic advantage from the misrepresentation of the food they are selling and (iii) the rights of consumers who expect to purchase and consume genuine, unadulterated and quality foods for which they usually pay a premium price. Having all this in mind, the European Union has issued regulation 178/2002 [4] regarding the quality, safety and traceability of commercially available foods. The conventional quality parameters were determined according to the Official EU method [22]. All determinations were carried out in triplicate.

HS-SPME-GC/MS
The determination of volatile compounds was carried out according Kosma et al. [1], using SPME in combination with GC/MS. Semi-quantification of volatile compounds was carried out using the internal standard method. Concentrations were calculated using the following formula: where C x = concentration of the unknown compound, C i = concentration of the internal standard solution, AREA x = peak area of the unknown compound and AREA i = peak area of the internal standard solution); results were expressed as µg/kg. All determinations were carried out in triplicate.

Statistical Analysis
Statistical treatment of data was performed using SPSS 25.0 software. Data were subjected to MANOVA in order to determine those variables that are significant for the differentiation of olive oil cultivar. Cultivar was taken as the independent variable, while volatile compounds were taken as the dependent variables. Pillai's Trace and Wilks' Lambda indices were computed to determine a possible significant effect of experimental parameter values on olive oil cultivar. LDA was then applied using the selected dependent variables in order to explore the potential for classification of olive oil samples according to cultivar. Original and leave-one-out cross-validation methods were used to test the prediction classification ability. In the original method, the prediction rate results from the contribution of all cases in the discriminant functions while in cross-validation, a randomly chosen parameter, is classified in a group based on a discriminant function, created by all the other parameters (except the randomly chosen one). This procedure is repeated for all the parameters of the tested sample. The homogeneity of variability was tested by application of the Box M index [31].
Since the number of volatile compounds resulting from the analysis is quite large, SLDA was used as a final step in order to determine those variables that show higher discriminant ability. The SLDA method is based on the creation of an initial model which does not include any of the significant variables-predictors. The predictors are introduced in the analysis sequentially, one-at-a-time, until all Foods 2020, 9, 1672 4 of 14 are included in the model. The SLDA classification method applies a forward variable selection algorithm using Wilks' Lambda as a selection criterion and the F-statistical factor in order to determine the significance of changes in Wilks' Lambda when the impact of a new variable is evaluated [32]. Before a new variable enters the classification model, the step-by-step process checks if all previous variables remain significant. If any of these are no longer significant, they are removed from the model and the process continues until there are no other variables that meet the entering standard or when the variable that will be inserted next is the one that was just rejected; at this point the variable selection process stops [33]. Thus, the SLDA procedure is guided by the corresponding F-to enter and the F-to remove values. The F-value, for a variable, indicates its statistical significance in discriminating between groups which is a measure of the degree a variable contributes to predicting group membership. The criteria for entry and removal are set by default, given by the statistical software, i.e., minimum F to enter the analysis is 3.84, maximum F to remove from the analysis is 2.71 [34]. The evaluation of the SLDA classification results was conducted using the leave-one-out method.

Analysis of Conventional Quality Parameters
As shown in Table 2 the majority of olive oil samples tested were categorized as extra virgin olive oil since their acidity, peroxide value and absorption coefficients (K 232 and K 270 ) did not exceed the internationally established limits set by the EU Regulation [22]. Specifically, acidity recorded values between 0.3% ± 0.2 in Galano samples to 1.8% ± 1.6 in Adramitiani samples which along with samples from Ladolia Kerkyras, Samothraki and Athinolia (1.3 ± 1.5, 0.9 ± 0.6 and 0.8 ± 0.6, respectively) recorded higher acidity values and were categorized as virgin olive oil. Furthermore, Ladolia Kerkyras recorded the highest K 232 value (2.76 ± 1.02) this categorizing this oil as "lampante" while the other samples remained lower than the internationally established limit of 2.50 [22]. In a similar study by Pouliarekou et al. [35] who classified olive oil samples from Western Greece according to cultivar and geographical origin, high values of quality parameters for samples belonging to Lanolia Kerkyras were also recorded. These samples were also categorized as "lampante" and according to the authors this may be related to the method of olive fruit collection, where in certain olive orchards in Kerkyra, fruits are left to fall off the olive tree and are collected in nets on the ground. In general, this collecting method is not a common practice as it causes damages to the fruit.

Analysis Volatile Compound Analysis
Sixty volatile compounds were identified and semi-quantified in olive oil samples tested (Table 3). These volatiles included alcohols, aldehydes, ketones, esters and hydrocarbons. The higher total concentration was recorded for the cultivars Ladolia Kerkyras (47,765.9 µg/kg), Topiki Makris (43,172.2 µg/kg) and Hontrolia (42,132.9 µg/kg).  The lipoxygenase pathway has a major contribution to virgin olive oil aroma as a wide variety of volatile compounds are produced through this biological pathway [19]. Aldehydes represented the most abundant chemical class, being the products of the lipoxygenase pathway, which starts right after damage of olive fruit tissues due to the release of enzymes that oxidize and cleave polyunsaturated fatty acids. (E)-2-Hexenal was the most abundant aldehyde identified in all samples tested, related to olive fruit maturity (characteristic of olive cultivar) and oxidation stage of olive oil [20,35] recording its higher concentration in the Topiki Makris cultivar (27,638.6 ± 4366.2 µg/kg). Hexanal followed, recording its highest concentration in the Ladolia Kerkyras (3774.7 ± 2983.8 µg/kg) and Topiki Makris (3189.8 ± 1057.7 µg/kg) samples. It should be noted that the relatively high hexanal content observed in the olive oil samples tested does not necessarily indicate either oxidized olive oils or olive oils in the early stages of oxidation. According to Morales et al. [36], and Vichi et al. [37], hexanal levels cannot distinguish oxidized from "virgin" olive oils as they come from both the lipoxygenase pathway and chemical oxidation of olive oil. (E)-2-Pentenal derives from the lipoxygenase pathway through the action of alkoxy radicals on linolenic acid 13-hydroperoxides producing the corresponding alcohol ((Z)-2-pentenol) which is subsequently oxidized. (E)-2-pentenal was not detected in the Adramitiani, Samothrakis, Athinolia, Ladolia Kerkyras and Manaki cultivars, while it showed the highest concentration in the Topiki Makris cultivar (67.7 ± 55.5 µg/kg). According to Morales et al. [36], and Kiritsakis [38], pentanal, octanal, nonanal and hexanal are the main compounds that form in oxidized olive oils, whereas in the samples tested, the first three were found at relatively low levels. Of these, pentanal and nonanal were identified in all olive oil samples, while octanal was not detected in the in Manaki cultivar.
Aldehydes are reduced to alcohols through the action of alcohol dehydrogenase. (E)-2-Hexenol was the most abundant alcohol recording its highest concentration in the Koutsourelia (1397.7 ± 1385.0 µg/kg), while hexanol, the second most abundant alcohol recorded the highest concentration in the Kolovi (1704.6 ± 1079.5 µg/kg) samples. These two alcohols can be used for cultivar differentiation while (E)-2-hexenol is responsible for the characteristic "green" aroma notes of the olive oil; hexanol's odor perception is considered as fruity, banana like and grassy [1,2]. 1-Penten-3-ol, deriving from the lipoxygenase pathway through the action of alkoxy radicals on 13-hydroxy peroxides of linolenic acid [18], was present in all olive oil samples and its concentration ranged from 6.5 ± 23.5 µg/kg in the Samothrakis samples to 165.5 ± 125.4 µg/kg in Koutsourelia.
Another important chemical class that has a major contribution to olive oil aroma is that of esters. Esters derive from the lipoxygenase pathway, through the action of alcohol acyltransferase that catalyzes the formation of acetate esters through acetyl-CoA derivatives. Despite the fact that esters comprise minor components of olive oil aroma their contribution is quite significant as they complement aroma with sweet and pleasant notes [18,39]. Kolovi (3307.1 µg/kg), Manaki (2870.6 µg/kg) and Koutsourelia (2710.4 µg/kg) samples recorded the highest total concentrations compared to the other cultivars while esters were not identified in the volatile fraction of the Galano and Hontrolia olive oils. The absence of esters in the volatile fraction of certain cultivars may be due to the action of the enzyme alcohol acyltransferase. The activity of this enzyme is significantly influenced by pH (6.8-8) and temperature (35 • C), as well as the availability of the appropriate substrate. The activity of alcohol acyltransferase can be enhanced by cultivar selection as well as by modifying olive oil extraction conditions, i.e., operation at lower temperatures to prevent inactivation of the enzymes and promotion of esterification reactions [39].

Multivariate Analysis of Variance
As a first step, the 167 olive oil samples were subjected to MANOVA in order to determine those volatile compounds which are significant for cultivar differentiation. Dependent variables included the total 60 volatile compounds identified and semi-quantified while cultivar was taken as the independent variable. Pillai's Trace = 7.049 (F = 6.059, p = 0.001 < 0.05) and Wilks' Lambda = 0.001 (F = 8.875, p = 0.001 < 0.05) index values showed the existence of a significant multivariable effect of cultivar on the identity of volatile compounds.

Linear Discriminant Analysis
Fifty-five volatiles were found to be significant (p < 0.05) for cultivar differentiation and thus, were subjected to LDA, a second step of the analysis. Results showed that three statistically significant discriminant functions are formed (Table 4). A significant value of Wilks' Lambda index shows that the discriminant function is basic for the differentiation of the investigated groups. Testing of the uniformity of variability (Box M index = 736.955, F = 3.134, p = 0.050) was insignificant at the 95% confidence level indicating the existence of uniformity of sample variability for each cultivar. In Figure 1a it is shown that olive oil samples from Galano are very well differentiated from the other cultivars. In Figure 1b it is clear that Samothraki, Topiki Makris and Adramitiani cultivars are well differentiated while the other cultivars are overlapping. The overall correct classification rate was 99.4% for the original and 83.2% for the cross-validation method. Correct cultivar classification (100%) was achieved for Adramitiani, Samothraki and Galano cultivars.
Regarding five of the above olive cultivars (i.e., Galano, Samothrakis, Adramitiani, Athinolia and Ladolia Kerkyras) previously published work [1] showed that the application of LDA to olive oil volatiles led to a very satisfactory classification rate (97% original, 83% cross-validation), while Galano and Samothraki cultivars were fully differentiated from the rest of the cultivars investigated. Furthermore, regarding the other five cultivars (i.e., Hontrolia, Koutsourelia, Kolovi, Topiki Makris and Manaki) the classification rate achieved in previous work was 100% for the original and 82.4% for the cross-validation method also leading to very satisfactory differentiation of the tested cultivars [2]. In the present study, despite the quite large number of cultivars and dependent variables, the statistical model used showed a promising potential for olive oil cultivar differentiation. Combining the volatile compound analysis data of the ten cultivars, the classification rate slightly increased to 83.2% while four of the ten cultivars (Galano, Samothrakis, Topiki Makris and Adramitiani) were well differentiated. Regarding five of the above olive cultivars (i.e., Galano, Samothrakis, Adramitiani, Athinolia and Ladolia Kerkyras) previously published work [1] showed that the application of LDA to olive oil volatiles led to a very satisfactory classification rate (97% original, 83% cross-validation), while Galano and Samothraki cultivars were fully differentiated from the rest of the cultivars investigated. Furthermore, regarding the other five cultivars (i.e., Hontrolia, Koutsourelia, Kolovi, Topiki Makris and Manaki) the classification rate achieved in previous work was 100% for the original and 82.4% for the crossvalidation method also leading to very satisfactory differentiation of the tested cultivars [2]. In the present study, despite the quite large number of cultivars and dependent variables, the statistical model used showed a promising potential for olive oil cultivar differentiation. Combining the volatile compound analysis data of the ten cultivars, the classification rate slightly increased to 83.2% while four of the ten cultivars (Galano, Samothrakis, Topiki Makris and Adramitiani) were well differentiated.

Stepwise Linear Discriminant Analysis
As the final step of data statistical treatment, SLDA was used in order to select the variables with the higher discriminant ability. Of the 55 significant volatile compounds only 17 were found to have a higher discriminant ability (Table 5). Three statistically significant discriminant functions were formed ( Table 4). As shown in Figure 2a, the Galano samples are very well differentiated. Figure 2b shows that olive oil samples from Samothraki, Topiki Makris and Adramitiani cultivars are also adequately differentiated while all other cultivars are overlapping. The overall correct classification rate was 94% for the original and 85.6% for the cross-validation method, somewhat increased in this case. Correct cultivar classification (100%) was achieved only for Galano, Adramitiani and Topiki Makris cultivars.

Stepwise Linear Discriminant Analysis
As the final step of data statistical treatment, SLDA was used in order to select the variables with the higher discriminant ability. Of the 55 significant volatile compounds only 17 were found to have a higher discriminant ability (Table 5). Three statistically significant discriminant functions were formed ( Table 4). As shown in Figure 2a, the Galano samples are very well differentiated. Figure 2b shows that olive oil samples from Samothraki, Topiki Makris and Adramitiani cultivars are also adequately differentiated while all other cultivars are overlapping. The overall correct classification rate was 94% for the original and 85.6% for the cross-validation method, somewhat increased in this case. Correct cultivar classification (100%) was achieved only for Galano, Adramitiani and Topiki Makris cultivars.     Of the 17 volatile compound-markers, 5 derive through the lipoxygenase pathway [1-penten-3-ol, (E)-2-hexen-1-ol, (E)-2-pentenal, (E)-2-hexenal and hexyl acetate. This fact indicates strong dependence of this biological pathway on cultivar as these compounds have a major contribution to virgin olive oil aroma with a wide variety of volatile compounds being produced through this biological pathway [13]. Pizarro et al. [32], in an attempt to recognize volatile markers for the geographical discrimination of Spanish olive oil samples, identified six volatile compounds as markers, all deriving from the lipoxygenase pathway. According to Angerosa et al. [18] the effect of cultivar can be demonstrated by the various amounts of C6 compounds resulting from the LOX pathway for oils obtained under the same operating conditions collected at the same ripening stage. Furthermore, minor dependence of the number of volatiles from climatic conditions and the geographical area of cultivation emphasizes that cultivar is the dominant factor affecting the aroma formation of olive oil. This feature, in combination with the different concentration of (E)-2-hexenal, represents an effective tool for differentiating single-variety oils from different varieties. Focusing on specific volatile compound-markers, Kesen et al. [42] reported that the most abundant aldehyde was (E)-2-hexenal, followed by hexanal, in Turkish olive oil samples belonging to the Halhali cultivar. Furthermore, Bubola et al. [44] reported that among the 50 volatile compounds identified in olive oils from Bova cultivar, C6 compound (E)-2-hexenal was the most abundant aldehyde, while Tanouti et al. [40] recorded that the main volatile compounds present in olive oil samples produced in eastern Morocco were C6 compounds such as hexanal, (E)-hex-2-enal, Z-3-hexen-1-ol and 1-hexanol, as in the present study. Finally, Issaoui et al. [26] who studied the effect of the growing area of cultivation on aroma profiles of Chemlali and Chetoui cultivars, recorded that (E)-2-hexenal and 1-hexanol can be used as potential indicators, in this case for geographical differentiation. Ethanol, according to Kalua et al. [39], results from the fermentation process that takes place in the olive fruit before oil extraction contributing wine aroma notes to olive oil. It, thus, can be considered as a sugar fermentation marker. Nonanal is one of the main compounds that forms in oxidized olive oils and is associated with oil sensory defects [36,38,41]. Thus, nonanal can be considered as a possible marker of early oxidation processes in olive oil. Finally, terpene concentration significantly varies depending on cultivar and geographical origin and terpenes have been suggested as indicators for virgin olive oil differentiation [42,44]. In the present study three terpenes (dl-limonene, (E)-β-ocimene and α-copaene) were identified as volatile markers. According to Zunin et al. [45], α-copaene along with α-muurolene and α-farnesene were the terpenes that aided the discrimination between extra virgin olive oil from West Liguria from those of other Mediterranean regions.

Conclusions
In the present study, volatile compound analysis from olive oil samples belonging to ten different cultivars from Greece was carried out in an effort to (i) to differentiate cultivar based on volatile compounds and (ii) to investigate the selection of potential markers leading to a successful cultivar differentiation. The results of the statistical treatment (MANOVA/LDA) showed that the differentiation of olive oil samples according to cultivar is possible despite the quite large number of cultivars investigated (83.2% cross-validation). Furthermore, the results obtained after the application of SLDA to the selected set of variables (55 volatile compounds) led to a quite reduced set of data (17 volatile compounds). These compounds provide a higher discriminant ability compared to the other volatiles, increasing the classification rate to 85.6% with the application of the cross-validation method. The specific volatile compounds identified as markers included a total of seventeen compounds. Of these, five derive from the lipoxygenase pathway (hexyl acetate, (E)-2-hexenal, (E)-2-hexenol, 1-penten-3-ol, (E)-2-pentenal) while three are terpenes (α-copaene, (E)-β-ocimene, dl-limonene), indicating a strong dependence of cultivar on the formation of olive oil's volatile fraction. The resulting distribution diagram showed that the cultivars Galano, Samothrakis, Topiki Makris and Adramitiani were clearly differentiated. The results are quite encouraging, demonstrating the validity of the statistical model developed for the authentication of olive oil in relation to the differentiation of olive cultivars.