NMR-Based Metabolic Profiling of Edible Olives—Determination of Quality Parameters

Edible olive drupes (from Olea europaea L.) are a high-value food commodity with an increasing production trend over the past two decades. In an attempt to prevent fraud issues and ensure quality, the International Olive Council (IOC) issued guidelines for their sensory evaluation. However, certain varieties, geographical origins and processing parameters are omitted. The aim of the present study was the development of a method for the quality assessment of edible olives from the Konservolia, Kalamon and Chalkidikis cultivars from different areas of Greece processed with the Spanish or Greek method. A rapid NMR-based untargeted metabolic profiling method was developed along with multivariate analysis (MVA) and applied for the first time in edible olives’ analysis complemented by the aid of statistical total correlation spectroscopy (STOCSY). Specific biomarkers, related to the classification of olives based on different treatments, cultivars and geographical origin, were identified. STOCSY proved to be a valuable aid towards the assignment of biomarkers, a bottleneck in untargeted metabolomic approaches.


Introduction
The olive fruit is a drupe, oval in shape, produced by the olive tree (Olea europaea L.). Olives, based on cultivar, can either be used for the production of the well-known and high-valued olive oil or undergo processing and become an edible product. Consumption of edible olives worldwide has doubled since the year 2000, reaching a provisional 2.7 million tons in 2019, as more people gradually adopt the Mediterranean diet [1]. Greece, despite its small area, is estimated to account for 7% of the world production in 2020 [2].
Processing in the production of edible olives is essential to remove the bitter taste that makes them undesirable, which is due to certain phenolic compounds such as the secoiridoid glucoside, oleuropein [3]. Thus far, there are three main approaches with industrial scalability used for debittering olives, which include the Greek, Spanish and Californian processing methods. In short, in the Greek method, also known as natural or organic, harvested olives are brined in a sodium chloride solution and fermented with indigenous microbiota. In this case, the debittering occurs slowly and bitter Greek (black) 6 4 Fthiotida Konservolia 9 5 Fthiotida Kalamon 1 6 Aitoloakarnania Kalamon 14 7 Peloponnese Messinia Kalamon 10 8 Lakonia Kalamon 5 All samples were stored, preserved and extracted under the same conditions prior to analysis in order to minimize any induced variability. Furthermore, complete information accompanying each sample (metadata) was registered (Table S1).

Acquisition, Data Processing and Multivariate Analysis
Over the last several years, NMR is increasingly being introduced in the quality control of foods, offering important advantages, i.e., speed, reproducibility and simplicity [54,55]. To that end, NMR metabolic profiling was selected in the current study in order to analyze the 60 collected samples following an in-house developed protocol. Considerable attention was given to parameters such as the selection of the drying process, extraction solvents and yield, required quantity for analysis, internal standard (IS) and deuterated solvent used (data not shown).
Samples were dissolved in methanol-d 4 with Hexamethyldisiloxane (HMDSO) as an IS and a line-shape indicator for the recorded spectra. A total of 60 spectra were acquired using a simple zg pulse sequence.
Representative spectra from two varieties (different geographical origin and processing method) are given in Figure 1 and signals of identified compounds are annotated, providing a coarse idea of the samples' nature. Full peak assignment is given in Table 2. Qualitative differences are evident between the two samples even with visual inspection. Furthermore, quantitative variations of the peaks corresponding to fluctuations in metabolites' levels are also observed. For instance, phenolics' signals which resonate downfield seem to vary considerably between the two spectra, along with signals upfield, indicative of aliphatic chains. Chemical shifts' differentiation is evident between 3 and 5 ppm, where commonly protons of double/triple bonds, heterocyclic and oxygenated systems resonate, i.e., carbohydrates and polyols.
Due to pH variances, minor shifts were observed in certain peaks running through the spectra. To that end, the icoshift tool in the MATLAB suite was applied with specifically chosen intervals for the alignment of all spectra. IS normalization was applied to eliminate any induced variability originating from sample preparation and acquisition of the spectra. The datasheet was imported into SIMCA v. 14.1 to perform MVA. Geographical origin, tree cultivar and processing type were the studied classes characterizing the observations (samples).
In order to form the first visual opinion of the data, detect any inherent pattern within them and determine possible outliers, principle component analysis (PCA) was performed. In the current study, Pareto scaling along with logarithmic transformation were used to construct the statistical models, as weight adjustment was mandatory for major metabolites. As a next step, partial least squares discriminant analysis (PLS-DA) was performed using the same datasheet of variables. Finally, orthogonal partial least squares projections to latent structures with discriminant analysis (OPLS-DA) models were built by taking groups in pairs, in order to identify certain markers responsible for their differentiation. Outliers were not detected in any built model.     Geographical Origin, Olive Cultivar, Processing Procedure Three regions with the highest production in Greece-Peloponnese (south), Sterea Ellada (center) and Makedonia (north)-were used for sample collection to study the impact that geographical origin might have on the samples' nature. Firstly, in the PCA model ( Figure 2), four first components accounted for more than 80% cumulative variation in the data with the most significant separation observed in PC1 and PC2. A distinct classification was evident in samples from Makedonia, in the northern part of Greece, while Peloponnese and Sterea Ellada were not separated, despite the evident trend. This observation is justified as Peloponnese and Sterea Ellada share more common climatological and soil conditions compared with Makedonia. For the construction of the PLS-DA model, three components were used. As expected, a clearer separation than in PCA was observed, with samples from Makedonia forming a distinct cluster. Certain samples from Sterea Ellada and Peloponnese were, however, still intertwined, showing once again the relevance between these regions. The model was validated with permutation tests for each of the three regions (500 permutations used, Figure S2), showing that the separation was due to alterations amongst samples and the models were not overfit.
Finally, three OPLS-DA models were built by taking regions in pairs with the end-goal being the identification of compounds distinct to each region. R 2 and Q 2 values verified the goodness of fit and predictivity of the models, with the issue between Sterea Ellada and Peloponnese revealing once more the similarities between these samples. To facilitate model interpretation, we constructed the respective S-plot to each OPLS-DA model, that provides visualization of its predictive component loading. However, the use of variable importance for the projection (VIP) values of the features of each OPLS-DA loadings plot was what actually led to the selection of the most relevant features. Features with a VIP > 1 and a larger effect in class discrimination were selected (Table S2) and further evaluated [44]. Moving on to cultivar, three of the four most common olive cultivars of Greece were included in the present study, Kalamon, Konservolia and Chalkidikis [53]. Similar to geographical origin, PCA was initially performed to investigate possible patterns between cultivars. The first two components of the model accounted for more than 70% cumulative variation in the data ( Figure 2). Expectably, Chalkidikis was definitely clustering, whereas in Kalamon and Konservolia, clustering was accompanied by strong overlapping in some samples. Interestingly, two subgroups were observed in the Konservolia samples, which with close inspection proved to be due to the different geographical origins, i.e., Magnesia and Fthiotida. This observation was also reported in the OPLS-DA plots, constructed to compare cultivars in pairs ( Figure S4). Clear grouping and separation between Magnesia and Fthiotida were once again verified, while subgroups related to geographical origin were not observed in either Chalkidikis or Kalamon cultivars. Our findings suggest that geographical origin seems to influence significantly the phenolic composition of olives, even more than cultivar, at least in the case of Konservolia. Moving on to cultivar, three of the four most common olive cultivars of Greece were included in the present study, Kalamon, Konservolia and Chalkidikis [53]. Similar to geographical origin, PCA was initially performed to investigate possible patterns between cultivars. The first two components of the model accounted for more than 70% cumulative variation in the data ( Figure 2). Expectably, Chalkidikis was definitely clustering, whereas in Kalamon and Konservolia, clustering was accompanied by strong overlapping in some samples. Interestingly, two subgroups were observed in the Konservolia samples, which with close inspection proved to be due to the different geographical origins, i.e., Magnesia and Fthiotida. This observation was also reported in the OPLS-DA plots, constructed to compare cultivars in pairs ( Figure S4). Clear grouping and separation between Magnesia and Fthiotida were once again verified, while subgroups related to geographical origin were not observed in either Chalkidikis or Kalamon cultivars. Our findings suggest that geographical origin seems to influence significantly the phenolic composition of olives, even more than cultivar, at least in the case of Konservolia.
Similar to the case of geographical origin, further validation was performed with permutation tests for each cultivar, indicating the statistical significance of the aforementioned values ( Figure S3). An identical workflow was also adopted for the discovery of the most relevant features responsible for differentiation among cultivars. OPLS-DA models, built with two classes per model, showed Q 2 and R 2 values very close to 1 ( Figure S4), verifying their validity. VIP tables extracted from each model were further analyzed in order to identify marker compounds (Table S2).
Concerning the processing procedure, edible olives processed with either the Greek or the Spanish method are represented in this study. As mentioned before, the Greek method is a more natural way of processing mainly purple or black olives that is comprised of brining and fermentation, whereas the Spanish includes an initial additional step of lye treatment and is used mostly for green olives. During lye treatment, olives are placed in large tanks containing sodium hydroxide solution (1.3-2.6% w/v), where they remain for 4-15 h until the solution penetrates the skin and approaches the pit, pushing the mutation of the chemical profile in a more forceful manner [16,18,53].
Yet the PCA model highlights the remarkable discrimination based on different processing methods ( Figure 2). This observation was further reinforced with the respective PLS-DA model ( Figure  S5). Q 2 and R 2 values, along with the carried-out permutation tests, verified the reliability of the model ( Figure S5). The respective OPLS-DA model was once again constructed as a means to an end, with the final step being the S-plot and the extraction of a VIP list for the mining of biomarkers responsible for the observed clustering (Table S2). With that being said, the model built with one predictive and three orthogonal components was checked for its validity with Q 2 and R 2 both above 0.75.

Statistical Total Correlation Spectroscopy (STOCSY) and Biomarker Identification
As already mentioned, a really critical step in untargeted metabolic profiling approaches is the mining of relevant features followed by their translation into compounds, which could function as possible markers. Nevertheless, the identification process which commonly involves the dereplication of known compounds is a challenging task [42]. Thus, in the current study a step-by-step methodology was followed to initially reveal significant loadings using MVA, while STOCSY was, in turn, applied for the identification of relevant markers. Chemical shifts from the generated VIP lists (VIP > 1) of the OPLS-DA models were used as "driving peaks" in the STOCSY experiments. STOCSY plots derive from the correlation between the selected statistically significant loadings or peaks and all other NMR data points to produce the respective pseudo-spectra. Removal of solvents' chemical shifts (methanol-d4, methanol, water) was part of the experimental protocol.
In more detail, the resonance at 2.600 ppm (VIP No 16 in Chalkidikis vs. Konservolia and No 41 in Kalamon vs. Konservolia, Table S2) was used as a driving peak, revealing a set of others indicative for the compound Hydroxytyrosol (HT) (Figure 3a). Using 2.651 ppm as a driving peak, another molecule, Tyrosol (Tyr), was identified ( Figure 3b). HT and Tyr are characteristic phenylethanoids present in both fresh and processed olives, as well as in olive oil and olive by-products [13,67,68].
Similar to the case of geographical origin, further validation was performed with permutation tests for each cultivar, indicating the statistical significance of the aforementioned values ( Figures S3).
An identical workflow was also adopted for the discovery of the most relevant features responsible for differentiation among cultivars. OPLS-DA models, built with two classes per model, showed Q 2 and R 2 values very close to 1 ( Figure S4), verifying their validity. VIP tables extracted from each model were further analyzed in order to identify marker compounds (Table S2).
Concerning the processing procedure, edible olives processed with either the Greek or the Spanish method are represented in this study. As mentioned before, the Greek method is a more natural way of processing mainly purple or black olives that is comprised of brining and fermentation, whereas the Spanish includes an initial additional step of lye treatment and is used mostly for green olives. During lye treatment, olives are placed in large tanks containing sodium hydroxide solution (1.3-2.6% w/v), where they remain for 4-15 h until the solution penetrates the skin and approaches the pit, pushing the mutation of the chemical profile in a more forceful manner [16,18,53].
Yet the PCA model highlights the remarkable discrimination based on different processing methods ( Figure 2). This observation was further reinforced with the respective PLS-DA model ( Figure S5). Q 2 and R 2 values, along with the carried-out permutation tests, verified the reliability of the model (Figures S5). The respective OPLS-DA model was once again constructed as a means to an end, with the final step being the S-plot and the extraction of a VIP list for the mining of biomarkers responsible for the observed clustering (Table S2). With that being said, the model built with one predictive and three orthogonal components was checked for its validity with Q 2 and R 2 both above 0.75.

Statistical Total Correlation Spectroscopy (STOCSY) and Biomarker Identification
As already mentioned, a really critical step in untargeted metabolic profiling approaches is the mining of relevant features followed by their translation into compounds, which could function as possible markers. Nevertheless, the identification process which commonly involves the dereplication of known compounds is a challenging task [42]. Thus, in the current study a step-bystep methodology was followed to initially reveal significant loadings using MVA, while STOCSY was, in turn, applied for the identification of relevant markers. Chemical shifts from the generated VIP lists (VIP > 1) of the OPLS-DA models were used as "driving peaks" in the STOCSY experiments. STOCSY plots derive from the correlation between the selected statistically significant loadings or peaks and all other NMR data points to produce the respective pseudo-spectra. Removal of solvents' chemical shifts (methanol-d4, methanol, water) was part of the experimental protocol.
In more detail, the resonance at 2.600 ppm (VIP No 16 in Chalkidikis vs. Konservolia and No 41 in Kalamon vs. Konservolia, Table S2) was used as a driving peak, revealing a set of others indicative for the compound Hydroxytyrosol (HT) (Figure 3a). Using 2.651 ppm as a driving peak, another molecule, Tyrosol (Tyr), was identified (Figure 3b). HT and Tyr are characteristic phenylethanoids present in both fresh and processed olives, as well as in olive oil and olive by-products [13,67,68]. "driving peak" was at 2.600 ppm; (b) Tyr: "driving peak" was at 2.651 ppm; complete assignment in Table 2. At the covariance spectra, all peaks corresponding to the protons of both molecules are evident, i.e., the ABX and A2B2 spin systems for HT and Tyr, respectively, and the two methylene groups of the side-chain for each compound. The two molecules present strong structural similarities, with HT being a hydroxylated derivative of Tyr, which is reflected in the pseudo-spectra as well. It is important to note that both multiplicity and peak integration are also apparent and in accordance with their reference 1 H NMR spectra (Table 2). Both being major phenolics in table olives with an estimated range of 110-980 for HT and 40-135 for Tyr expressed in mg/kg of olive flesh according to Boskou et al. [16], they are clearly projected with no interference from peaks belonging to other compounds (noise or artefact peaks) and a correlation coefficient > 0.75. Moreover, no negative peaks (correlation coefficient < 0.00) were observed showing any anti-correlation with Tyr and HT.
The same workflow was followed for verbascoside or acteoside, a caffeoyl phenylethanoid glycoside. It is a structurally complex molecule, which is composed of a disaccharide formed by a glucose and a rhamnose unit, to which caffeic acid and HT are attached with an ester and an ether bond, respectively. Verbascoside (Ver) is considered a minor compound in edible olives [69], found also in olive paste [70]. However, the STOCSY tool made possible the unveiling of almost all peaks corresponding to resonances of this compound ( Figure 4). As expected, the pseudo-spectrum was more crowded, especially in the sugars' region, without nevertheless affecting the dereplication process and identification confidence.
At the covariance spectra, all peaks corresponding to the protons of both molecules are evident, i.e., the ABX and A2B2 spin systems for HT and Tyr, respectively, and the two methylene groups of the side-chain for each compound. The two molecules present strong structural similarities, with HT being a hydroxylated derivative of Tyr, which is reflected in the pseudo-spectra as well. It is important to note that both multiplicity and peak integration are also apparent and in accordance with their reference 1 H NMR spectra (Table 2). Both being major phenolics in table olives with an estimated range of 110-980 for HT and 40-135 for Tyr expressed in mg/kg of olive flesh according to Boskou et al. [16], they are clearly projected with no interference from peaks belonging to other compounds (noise or artefact peaks) and a correlation coefficient > 0.75. Moreover, no negative peaks (correlation coefficient < 0.00) were observed showing any anti-correlation with Tyr and HT.
The same workflow was followed for verbascoside or acteoside, a caffeoyl phenylethanoid glycoside. It is a structurally complex molecule, which is composed of a disaccharide formed by a glucose and a rhamnose unit, to which caffeic acid and HT are attached with an ester and an ether bond, respectively. Verbascoside (Ver) is considered a minor compound in edible olives [69], found also in olive paste [70]. However, the STOCSY tool made possible the unveiling of almost all peaks corresponding to resonances of this compound ( Figure 4). As expected, the pseudo-spectrum was more crowded, especially in the sugars' region, without nevertheless affecting the dereplication process and identification confidence. Correlation coefficients to the other signals in the median edible olive NMR spectrum are color-encoded: "driving peak" was at 6.994 ppm. Zoomin of the aromatic region is also presented. Similar to Ver, two flavonoids, Quercetin (Quer) and Luteolin (Lut), were also identified. Both compounds are also minor constituents of table olives with Quer present in lower concentrations compared with Lut, after hydrolysis of the respective glucosides due to processing [71]. Further, in this case, indicative correlation spectra for both compounds were derived. Interestingly, comparing the well resolved respective peaks of the compounds (L/Q H-6, H-8, H-5′), it was even possible to estimate their relative abundance (3:1 approximately) (Figure 5a). This is another strong point of STOCSY, as the quantitative potentials of NMR could be exploited and different compounds with proportional fluctuation could be identified. Correlation coefficients to the other signals in the median edible olive NMR spectrum are color-encoded: "driving peak" was at 6.994 ppm. Zoom-in of the aromatic region is also presented. Similar to Ver, two flavonoids, Quercetin (Quer) and Luteolin (Lut), were also identified. Both compounds are also minor constituents of table olives with Quer present in lower concentrations compared with Lut, after hydrolysis of the respective glucosides due to processing [71]. Further, in this case, indicative correlation spectra for both compounds were derived. Interestingly, comparing the well resolved respective peaks of the compounds (L/Q H-6, H-8, H-5 ), it was even possible to estimate their relative abundance (3:1 approximately) (Figure 5a). This is another strong point of STOCSY, as the quantitative potentials of NMR could be exploited and different compounds with proportional fluctuation could be identified. Triterpenic acids are structurally complex constituents of table olives with the more characteristic being Oleanolic Acid (OA) and Maslinic Acid (MA). A wide range of biological activities has been linked with them over time, like antitumor, antidiabetic, antioxidant, cardioprotective, antiparasitic and even neuroprotective [10,72]. Both are pentacyclic triterpenic acids, with MA being a hydroxylated derivative of OA without any other structural differences. It is worth mentioning that in the generated pseudo-spectrum, all methyl protons, which resonate between 0.7 and 1.2 ppm, were clearly revealed, showing in this case too the average relative concentration of these compounds in the analyzed samples ( Figure 5b, Table 2). Due to their proportional fluctuation in their concentration levels among samples, no distinct pseudo-spectra could be constructed for these compounds, resulting in a single one, where the correlation coefficient was over 0.80 for both with the same driving peak.
Short-chain fatty acids (SCFA) constitute typical metabolites of processed table olives. Specifically, in the samples at hand, Formic (FA), Acetic (AA), Propionic (PA) and Lactic (LA) Acids accompanied by Succinic Acid (SA) had their 1 H NMR spectra clearly projected through STOCSY. Their correlation spectra are truly indicative, enabling their identification with high confidence ( Figure 6, Table 2). As depicted, SA with LA, as well as PA with AA were projected in pairs. This observation implies a bio-synthetic association of the detected metabolites and can be exploited Triterpenic acids are structurally complex constituents of table olives with the more characteristic being Oleanolic Acid (OA) and Maslinic Acid (MA). A wide range of biological activities has been linked with them over time, like antitumor, antidiabetic, antioxidant, cardioprotective, antiparasitic and even neuroprotective [10,72]. Both are pentacyclic triterpenic acids, with MA being a hydroxylated derivative of OA without any other structural differences. It is worth mentioning that in the generated pseudo-spectrum, all methyl protons, which resonate between 0.7 and 1.2 ppm, were clearly revealed, showing in this case too the average relative concentration of these compounds in the analyzed samples ( Figure 5b, Table 2). Due to their proportional fluctuation in their concentration levels among samples, no distinct pseudo-spectra could be constructed for these compounds, resulting in a single one, where the correlation coefficient was over 0.80 for both with the same driving peak.
Short-chain fatty acids (SCFA) constitute typical metabolites of processed table olives. Specifically, in the samples at hand, Formic (FA), Acetic (AA), Propionic (PA) and Lactic (LA) Acids accompanied by Succinic Acid (SA) had their 1 H NMR spectra clearly projected through STOCSY. Their correlation spectra are truly indicative, enabling their identification with high confidence ( Figure 6, Table 2). As depicted, SA with LA, as well as PA with AA were projected in pairs. This observation implies a bio-synthetic association of the detected metabolites and can be exploited further when STOCSY is employed aside from its identification potential. Finally, a pseudo-spectrum indicative for Triacylglycerols (TAGs) was generated using the peak at 5.27 ppm as a driver. Mainly TAGs, but also diacylglycerols (DAGs) and free fatty acids (FFAs) are highly abundant in drupes [73,74]. As a result, the generated pseudo-spectrum was highly informative and free of any artefacts. Overall, this VIP-based untargeted approach, integrating STOCSY as a dereplication tool, led to nine generated pseudo-NMR spectra and thirteen statistically significant compounds. further when STOCSY is employed aside from its identification potential. Finally, a pseudo-spectrum indicative for Triacylglycerols (TAGs) was generated using the peak at 5.27 ppm as a driver. Mainly TAGs, but also diacylglycerols (DAGs) and free fatty acids (FFAs) are highly abundant in drupes [73,74]. As a result, the generated pseudo-spectrum was highly informative and free of any artefacts. Overall, this VIP-based untargeted approach, integrating STOCSY as a dereplication tool, led to nine generated pseudo-NMR spectra and thirteen statistically significant compounds. In parallel, a targeted approach, based on the literature search, led to the identification of linoleic acid and glycerol ( Table 2). An extensive search, that would reveal compounds that have been discovered so far in edible olives, was carried out. Limited studies exist to our knowledge, that investigate processed olives and their phenolic composition. Indeed, most involve fresh olives [34,[75][76][77], while there were others that used olives intended only for oil production [70,78]. To this date, the content of processed edible olives has been mainly investigated by HPLC-DAD [15,37,[73][74][75][76][77] and MS [32,[56][57][58][59]78], with NMR coming in a resounding third place [60]. It is important to note that minor deviations compared with the literature were observed, in some cases, in chemical shifts due to pH or calibration mismatches. Therefore, 2D spectra (JRES, HSQC and HMBC) were recorded in specific samples to ensure structure elucidation confidence (data not shown).

Quality and Authentication Assessment
Fraudulent practices in edible olives are numerous with the common denominator being adulteration and mislabeling. Organoleptic assessment suggested by the IOC, even though it provides a somewhat quality classification based on sensory attributes, cannot identify differences in other quality parameters, such as geographical origin, cultivar or processing method. Thus, the identification of specific markers or patterns indicative of these traits would be of utmost importance in the quality assessment and authentication of edible olives. In the current study, special attention has been given to the identification of specific measurable chemical markers indicative of analyzed cultivars or geographical origins, as well as processing methods supporting Greek PDO and PGI products. Figure 6. STOCSY 1D pseudo-NMR spectra. Correlation coefficients to the other signals in the median edible olive NMR spectrum are color-encoded. (a) LA and SA: "driving peak" was at 4.008 ppm; (b) PA and AA: "driving peak" was at 1.848 ppm; (c) FA: "driving peak" was at 8.470 ppm; (d) TAGs: "driving peak" was at 5.279 ppm.
In parallel, a targeted approach, based on the literature search, led to the identification of linoleic acid and glycerol ( Table 2). An extensive search, that would reveal compounds that have been discovered so far in edible olives, was carried out. Limited studies exist to our knowledge, that investigate processed olives and their phenolic composition. Indeed, most involve fresh olives [34,[75][76][77], while there were others that used olives intended only for oil production [70,78]. To this date, the content of processed edible olives has been mainly investigated by HPLC-DAD [15,37,[73][74][75][76][77] and MS [32,[56][57][58][59]78], with NMR coming in a resounding third place [60]. It is important to note that minor deviations compared with the literature were observed, in some cases, in chemical shifts due to pH or calibration mismatches. Therefore, 2D spectra (JRES, HSQC and HMBC) were recorded in specific samples to ensure structure elucidation confidence (data not shown).

Quality and Authentication Assessment
Fraudulent practices in edible olives are numerous with the common denominator being adulteration and mislabeling. Organoleptic assessment suggested by the IOC, even though it provides a somewhat quality classification based on sensory attributes, cannot identify differences in other quality parameters, such as geographical origin, cultivar or processing method. Thus, the identification of specific markers or patterns indicative of these traits would be of utmost importance in the quality assessment and authentication of edible olives. In the current study, special attention has been given to the identification of specific measurable chemical markers indicative of analyzed cultivars or geographical origins, as well as processing methods supporting Greek PDO and PGI products.
Thus, following the initial experimental design, our team, in turn, focused on the identified and statistically significant compounds by monitoring their relevant concentration in the different classes. HT and Tyr, the two main phenylethanoids found in processed edible olives, have been extensively investigated for their biological and pharmacological properties [79,80]. Worth highlighting is the fact that, in a recent EFSA claim, HT and its derivatives were correlated with a protective effect over blood lipids from oxidation resulting in cardio-protection for the consumer. The health claim concerns olive oils with certain concentration levels of these compounds [81]. Therefore, use of the claim is allowed on labels of bottled olive oils, but fails to include table olives, another rich source of these compounds.
Based on the box plots presented below, Kalamon and Chalkidikis cultivars (Figure 7) are richer in both HT and Tyr, compared with Konservolia, the other black olive variety of the present study, agreeing with the literature [15,16,18]. This is in accordance with their relative concentration when geographical origin is taken into account. Sterea Ellada shows the lowest abundance in both compounds compared with Makedonia and Peloponnese ( Figure S7). Recent studies show a probable effect of the Greek method towards higher concentrations of HT, whereas no effect was reported in the levels of Tyr. The respective t-test applied in this study reveals the possible significance of HT with a p value of 0.06, while the respective value for Tyr was 0.82, showing no statistical significance (Table S3).
Thus, following the initial experimental design, our team, in turn, focused on the identified and statistically significant compounds by monitoring their relevant concentration in the different classes. HT and Tyr, the two main phenylethanoids found in processed edible olives, have been extensively investigated for their biological and pharmacological properties [79,80]. Worth highlighting is the fact that, in a recent EFSA claim, HT and its derivatives were correlated with a protective effect over blood lipids from oxidation resulting in cardio-protection for the consumer. The health claim concerns olive oils with certain concentration levels of these compounds [81]. Therefore, use of the claim is allowed on labels of bottled olive oils, but fails to include table olives, another rich source of these compounds.
Based on the box plots presented below, Kalamon and Chalkidikis cultivars (Figure 7) are richer in both HT and Tyr, compared with Konservolia, the other black olive variety of the present study, agreeing with the literature [15,16,18]. This is in accordance with their relative concentration when geographical origin is taken into account. Sterea Ellada shows the lowest abundance in both compounds compared with Makedonia and Peloponnese ( Figure S7). Recent studies show a probable effect of the Greek method towards higher concentrations of HT, whereas no effect was reported in the levels of Tyr. The respective t-test applied in this study reveals the possible significance of HT with a p value of 0.06, while the respective value for Tyr was 0.82, showing no statistical significance (Table S3). A similar trend regarding cultivars is observed with Ver [15]. During processing, Ver is hydrolyzed, having as a result generally low levels in table olives, however detectable with NMR. In the current study, its detection was possible almost only in samples from the Kalamon cultivar (Peloponnese and Sterea Ellada). Moreover, Ver was detected in Peloponnese samples and in lower levels in Sterea Ellada, while being absent in samples from Makedonia (Figure 8, S8). This observation A similar trend regarding cultivars is observed with Ver [15]. During processing, Ver is hydrolyzed, having as a result generally low levels in table olives, however detectable with NMR. In the current study, its detection was possible almost only in samples from the Kalamon cultivar (Peloponnese and Sterea Ellada). Moreover, Ver was detected in Peloponnese samples and in lower levels in Sterea Ellada, while being absent in samples from Makedonia (Figure 8 and Figure S8). This observation might imply that Ver is a characteristic marker of the Kalamon variety with geographical origin not affecting its concentration considerably. Regarding the processing method, while the respective p value of the t-test implies statistical significance for Ver, no solid conclusion could be extracted since Chalkidikis is the only variety having undergone Spanish processing. These observations are in accordance with Sahan et al., who noted a direct correlation between the processing method and the levels of Ver in table olives [20].
Molecules 2020, 25, x FOR PEER REVIEW 13 of 22 might imply that Ver is a characteristic marker of the Kalamon variety with geographical origin not affecting its concentration considerably. Regarding the processing method, while the respective p value of the t-test implies statistical significance for Ver, no solid conclusion could be extracted since Chalkidikis is the only variety having undergone Spanish processing. These observations are in accordance with Sahan et al., who noted a direct correlation between the processing method and the levels of Ver in table olives [20]. An analogous observation was made in the case of Lut and Quer, the two flavonols abundant in olive parts and products. Both were found in considerably higher amounts in Peloponnese and the Kalamon cultivar as opposed to Makedonia and the Chalkidikis cultivar. Typically, they are found in different glycosidic forms in fresh olives, but they are hydrolyzed during their treatment towards becoming edible. Similarly to Ver, it seems that both compounds are markers of the Kalamon cultivar and secondarily of Konservolia, while they do not seem to be influenced by geographical origin. Regarding processing, the step of lye treatment of the Spanish method seems to severely increase diffusion and/or hydrolysis of Lut and Quer glycosides, therefore the aglycons are found solely in Greek-style black olives (Kalamon and Konservolia), as not even traces were found in samples from Makedonia and the Chalkidikis cultivar (Figures 7, 8 and S8) [16,18].
Triterpenic acids are also characteristic compounds found in edible olives, predominantly MA and OA, found in olive oil, olive leaves and olive mill wastewaters [55]. Both compounds were found in lower concentrations in the Konservolia cultivar compared with Chalkidikis and Kalamon ( Figures  8, S7), as well as in Sterea Ellada compared with the other regions. This observation comes in contradiction with previous studies, where Konservolia was presented as superior compared with Kalamon when it came to MA [15]. Their concentration seems to differ amongst cultivars, but does not seem to be affected by processing according to the respective t-test (Table S3), which once again comes in contrast with the literature [37]. Moreover, MA seems to constantly outweigh OA in terms of concentration, a finding also confirmed by the present study [15,37,66].
As mentioned above, TAGs constitute 20% of the total weight of an olive drupe. Despite their lipophilic nature, they are also present in the samples to some extent. Studies have shown that the An analogous observation was made in the case of Lut and Quer, the two flavonols abundant in olive parts and products. Both were found in considerably higher amounts in Peloponnese and the Kalamon cultivar as opposed to Makedonia and the Chalkidikis cultivar. Typically, they are found in different glycosidic forms in fresh olives, but they are hydrolyzed during their treatment towards becoming edible. Similarly to Ver, it seems that both compounds are markers of the Kalamon cultivar and secondarily of Konservolia, while they do not seem to be influenced by geographical origin. Regarding processing, the step of lye treatment of the Spanish method seems to severely increase diffusion and/or hydrolysis of Lut and Quer glycosides, therefore the aglycons are found solely in Greek-style black olives (Kalamon and Konservolia), as not even traces were found in samples from Makedonia and the Chalkidikis cultivar (Figure 7, Figure 8 and Figure S8) [16,18].
Triterpenic acids are also characteristic compounds found in edible olives, predominantly MA and OA, found in olive oil, olive leaves and olive mill wastewaters [55]. Both compounds were found in lower concentrations in the Konservolia cultivar compared with Chalkidikis and Kalamon (Figure 8, Figure S7), as well as in Sterea Ellada compared with the other regions. This observation comes in contradiction with previous studies, where Konservolia was presented as superior compared with Kalamon when it came to MA [15]. Their concentration seems to differ amongst cultivars, but does not seem to be affected by processing according to the respective t-test (Table S3), which once again comes in contrast with the literature [37]. Moreover, MA seems to constantly outweigh OA in terms of concentration, a finding also confirmed by the present study [15,37,66].
As mentioned above, TAGs constitute 20% of the total weight of an olive drupe. Despite their lipophilic nature, they are also present in the samples to some extent. Studies have shown that the processing method might have an impact on the levels of TAGs. In particular, the sodium hydroxide solution applied during the Spanish method considerably increases diffusion of all TAG derivatives in the solution, as well as accelerates the oxidation of the unsaturated ones [19,20]. Our findings verified the statistical significance of total TAGs in the discrimination between Spanish and Greek (t-test, Table  S3). This comes in agreement with the respective box plot among cultivars (Figure 8), as Chalkidikis contains noticeably lower concentrations in comparison with Konservolia and Kalamon, with the former having undergone the Spanish method and the latter two having undergone the Greek one.
An interesting class of compounds detected in the analyzed samples was SCFAs, specifically FA, AA, PA and LA and dicarboxylic acid SA. It is well documented that these compounds are produced during the fermentation process mainly from Lactobacillus species. Moreover, higher concentrations have been reported in Spanish-rather than in Greek-type olives [29,82,83]. Based on our data, considering geographical origin and cultivar, a trend was observed. To elaborate, LA and SA presented the same pattern, as they were found in descending levels in Makedonia > Sterea Ellada > Peloponnese, as well as in Chalkidikis > Konservolia > Kalamon. Similarly, PA and AA were found solely in Makedonia and the Chalkidikis cultivar, while FA was found additionally in Peloponnese and the Kalamon cultivar ( Figure 7, Figure 8, Figures S7 and S8). Furthermore, all five metabolites were found to be significant markers in favor of the Spanish method (Table S3). This could be easily observed in the respective heat map taking into consideration only groups' averages ( Figure 9a). Interestingly, FA, AA and PA seem to characterize olives produced with the Spanish method compared with the Greek one, in which they are practically absent. On the other hand, LA and SA are detected in samples from both methods with an obvious predominance in chemically processed samples ( Figure 7, Figure 8, Figures S7 and S8). Indeed, in the correlation coefficients" plot (Figure 9b), the association between PA, AA and FA is evident, when compared with LA and SA. To that end, the STOCSY tool, as stated previously, had already implied a bio-synthetic association of PA with AA and LA and SA, as they appeared in pairs in the generated pseudo-spectra.
To achieve a better insight, intra-group correlations were investigated. As displayed in the heat map of Figure 9c, discrimination between the two groups (Greek vs. Spanish) based on the SCFAs is apparent. Even if all monitored metabolites are markers for the Spanish processing method, their contribution varies. Indeed, samples which are characterized by high levels of LA and SA are low in AA and PA, and vice versa. FA on the other hand seems to follow a not so clear and intermediate pattern. However, with closer inspection of the metadata, it seems that geographical origin is also important when a marker is to be chosen. Even if all samples processed with the Spanish method are of the same cultivar and geographical origin, subregion seems to play a crucial role. In particular, AA and PA appear to be better markers for the subregion of Kavala, while LA, SA and FA for the subregion of Chalkidiki. Nevertheless, it has been reported that the presence of PA and AA with the simultaneous absence or significant decrease in LA in certain Spanish-type samples were proven to be an indicator of decomposition in fermented products [4,81]. Thus, further studies are needed in order to elucidate the role of geographical origin and/or spoilage in the observed SCFAs and SA trends in processed table olives.

Collection of Samples
Edible olives, belonging to different varieties, color and processing styles, were handpicked and provided by producers from various regions around Greece. In the present study, 60 samples were totally analyzed. All samples were stored in their brines in dark conditions and under room temperature (25 °C).

Lyophilization
A total of 10 olives per sample were picked randomly from the ones provided by the producers. They were lyophilized until the total loss of water content. Afterwards, pits were removed and the flesh was turned into powder manually.

Extraction Protocol
Official extraction protocols for edible olive analysis have yet to be issued by the IOC. Consequently, the respective official one for olive oil analysis was used in this study and developed to match the different nature of the samples [84]. Specifically, 0.3 g of homogenized olive powder was accurately weighed in a 50 mL tube container (falcon type, ISOLAB Laborgeräte GmbH, Germany) and 30 mL of methanol/water: 80/20 (v/v) (HPLC grade solvents, Fisher Scientific Loughborough, UK), was added as an extractant solution. The mixture was shaken for exactly 1.5

Collection of Samples
Edible olives, belonging to different varieties, color and processing styles, were handpicked and provided by producers from various regions around Greece. In the present study, 60 samples were totally analyzed. All samples were stored in their brines in dark conditions and under room temperature (25 • C).

Lyophilization
A total of 10 olives per sample were picked randomly from the ones provided by the producers. They were lyophilized until the total loss of water content. Afterwards, pits were removed and the flesh was turned into powder manually.

Extraction Protocol
Official extraction protocols for edible olive analysis have yet to be issued by the IOC. Consequently, the respective official one for olive oil analysis was used in this study and developed to match the different nature of the samples [84]. Specifically, 0.3 g of homogenized olive powder was accurately weighed in a 50 mL tube container (falcon type, ISOLAB Laborgeräte GmbH, Germany) and 30 mL of methanol/water: 80/20 (v/v) (HPLC grade solvents, Fisher Scientific Loughborough, UK), was added as an extractant solution. The mixture was shaken for exactly 1.5 min in a vortex agitator (Genious 3, IKA-Werke GmbH & Co, Staufen, Germany). Then, the tube was placed in the ultrasonic bath for 15 min at room temperature (RT) and centrifuged (Heraeus Multifuge 3S, Thermo Fisher Scientific, Waltham, MA, USA) at 4000 rpm for 25 min at RT. The supernatant was filtered with a 5 mL plastic syringe through a 0.45 µm PVDF filter and transferred to a new tube (falcon type). A glass syringe of 2.5 mL, (Gastlight, 1002TLL, Hamilton Company, Reno, NV, USA) was used to transfer 15 mL of the extract to a glass spherical flask and evaporate to dryness with a RotaVapor at 40 • C under vacuum (Büchi AG, Flawil, Switzerland). The solid extract was transferred to a 2 mL eppendorf (Eppendorf AG, Hamburg, Germany) and dried using a centrifugal evaporator with vacuum (Concentrator Plus, Eppendorf AG, Germany). Samples, after their extraction, were stored in a freezer at −20 • C until analysis.

NMR Experimental Parameters
1 H NMR experiments were recorded at 305 K on a Bruker AVANCE III 600 NMR spectrometer (Bruker GmbH, Rheinstetten, Germany) operating at the proton frequency of 600.13 MHz (B0 = 14.1 T) and equipped with a z-gradient inverse detection 5-mm probe and a BCU for temperature control. Spectra were recorded with the help of a 60-place sample changer (B-ACS-60), using the IconNMR automation software by Bruker. The following conditions were used for the acquisition: number of scans, 64; π/2 pulse,~8 µs; time domain (TD), 64k data points; acquisition time, 2.73 s; relaxation delay, 3.17 s; spectral width, 12019.2 Hz; and mixing time, 0.060 s. The spectra were obtained by the Fourier transformation (FT) of the free induction decay (FID) by applying an exponential multiplication with a line-broadening factor (lb) of 0.3 Hz and zero-filling (size = 256K) procedure. Resulting spectra were manually phased and baseline-corrected using a polynomial function in the Bruker TopSpin software (version 4.0.6). Chemical shifts were reported with respect to the IS's signal set at 0 ppm.

Computational Processing and Multivariate Analysis
NMR raw data were inserted in the MATLAB suite (version R2018b) for further processing. Initially, spectra were aligned using the icoshift tool (Version 3.0 beta) and a targeted selection of intervals optimized depending on the sample type. NMR spectra (spectral width from −0.5 to 11 ppm) were segmented into 1,111 bins with a bin size of 0.01 ppm for MVA and into 11,101 bins with a bin size of 0.001 ppm for the STOCSY experiments. Data were normalized using an in-house routine and the area of the IS's peak was used as a reference value. Solvent signals (methanol-d 4 , methanol, water) and the IS signal, as well as the region from 8.5 to 9.6 ppm were removed for MVA. In particular, bins of the last segment mentioned were summed together as there were only minor peaks surfacing from the noise. As a result, 753 variables were obtained after data processing in MATLAB.
Data with the 0.01 ppm bin size were extracted and imported into SIMCA v. 14.1 (Umetrics, Umea, Sweden), where they were subjected to MVA and specifically the PCA, PLS-DA and OPLS-DA methods of analysis [85]. Prior to PCA, data were scaled using Pareto scaling along with logarithmic transformation and then mean-centered. In Pareto scaling, each variable is given a variance numerically equal to its initial standard deviation instead of unit variance, with the scaling weight being 1/ √ s k , where s k is the standard deviation of the k variable. With mean-centering, the average value of each variable is calculated and then subtracted from the data, hence improving the interpretability of the model. Data with the 0.001 ppm bin size were used for the 1D STOCSY analyses with an in-house routine in the MATLAB suite. In theory, given that the spectrometer conditions are kept identical between samples, because the different resonance intensities from a single molecule will always have the same ratio, the relative intensities will be totally correlated. STOCSY utilizes this principle to generate a pseudo-NMR spectrum that displays the correlation among the intensities of various peaks across the entire sample [43]. Specifically, in this study, STOCSY calculates the correlation coefficient between a defined "driver" peak and all other signals in the spectrum. A threshold of 0.75 for the correlation coefficient was set on the whole dataset and it was kept at the same level for all generated pseudo-spectra. Therefore, our two-step approach uses OPLS-DA to extract NMR shift values that are then cross-combined with the STOCSY analysis, in order to strengthen the identification of the molecules responsible for any metabolic alteration.
Correlation analysis and heat maps were performed for data visualization using MetaboAnalyst 4.0. A "warm-to-cool" color spectrum was used in order to reveal the metabolic alterations of the selected metabolites in the samples set in a column hierarchical cluster structure of the data matrix [86].

Conclusions
Edible olives, a valuable element of the Mediterranean diet, keep rising as a nutritional choice worldwide, while the level of fraud has always been an issue in the food industry. From geographical origin to variety and processing method, quality should be a prerequisite for the last border, the consumer. The protocols issued by the IOC so far only cover the important, yet inadequate, sensory aspect. In the current study, NMR-based metabolomics was combined with MVA for the first time in the quality assessment of edible olives. Several markers were successfully identified for the discrimination between samples with a different geographical origin, variety and processing method. Furthermore, the-novel in natural products-tool of STOCSY was applied, revealing certain peak correlations, which in collaboration with the literature led to an unprecedented peak assignment for these compounds in the total extract of such samples. This work is an important contribution to the confrontation of food mislabeling, intentional or accidental. More samples from different varieties, regions and processing methods should be analyzed to expand the versatility of the models created and more unknown compounds have to be identified in the future.
Supplementary Materials: The following are available online at. Figure S1. PLS-DA scores plot of the geographical origin parameter with the respective conducted permutation tests; Figure S2. OPLS-DA scores plots of the geographical origin parameter with their respective S-plots; Figure S3. PLS-DA scores plot of the variety parameter with the respective conducted permutation tests; Figure S4. OPLS-DA scores plots of the geographical origin parameter with their respective S-plots; Figure S5. PLS-DA scores plot of the processing parameter with the respective conducted permutation tests; Figure S6. OPLS-DA scores plot of the processing parameter with its respective S-plot; Figure S7. Box plots of the remaining markers in the origin parameter; Figure S8. Box plots of the remaining markers in the variety parameter; Table S1. Description of samples' characteristics; Table S2. VIPs lists from all OPLS-DA models; Table S3. T-test applied at the processing method parameter, Greek vs. Spanish.