Thorough Investigation of the Phenolic Profile of Reputable Greek Honey Varieties: Varietal Discrimination and Floral Markers Identification Using Liquid Chromatography–High-Resolution Mass Spectrometry

Honey is a highly consumed commodity due to its potential health benefits upon certain consumption, resulting in a high market price. This fact indicates the need to protect honey from fraudulent acts by delivering comprehensive analytical methodologies. In this study, targeted, suspect and non-targeted metabolomic workflows were applied to identify botanical origin markers of Greek honey. Blossom honey samples (n = 62) and the unifloral fir (n = 10), oak (n = 24), pine (n = 39) and thyme (n = 34) honeys were analyzed using an ultra-high-performance liquid chromatography hybrid quadrupole time-of-flight mass spectrometry (UHPLC-q-TOF-MS) system. Several potential authenticity markers were revealed from the application of different metabolomic workflows. In detail, based on quantitative targeted analysis, three blossom honey markers were found, namely, galangin, pinocembrin and chrysin, while gallic acid concentration was found to be significantly higher in oak honey. Using suspect screening workflow, 12 additional bioactive compounds were identified and semi-quantified, achieving comprehensive metabolomic honey characterization. Lastly, by combining non-targeted screening with advanced chemometrics, it was possible to discriminate thyme from blossom honey and develop binary discriminatory models with high predictive power. In conclusion, a holistic approach to assessing the botanical origin of Greek honey is presented, highlighting the complementarity of the three applied metabolomic approaches.


Introduction
Honey is a natural sweetener widely consumed worldwide due to its potential healthpromoting benefits, such as immune-modulatory, anti-proliferative and anti-inflammatory effects [1], upon certain dietary consumption. However, considering the great nutritional value of honey in combination with its high price and significant market share, it is a commodity highly susceptible to fraudulent practices. Botanical or geographical origin mislabeling, the addition of low-cost syrups and dilution of honey with water are among the most common fraudulent acts [2]. Thus, it is necessary to develop analytical methods to assure honey quality and protect consumers and market sustainability.
A plethora of analytical techniques has been used to assess honey authenticity. As we comprehensively discussed in our recent publication [3], chromatographic and spectroscopic techniques, alongside conventional methods measuring physicochemical properties, such as electrical conductivity or acidity, are commonly applied in the field. Although spectroscopic methods usually provide non-destructive analysis and conventional methods are widely available due to their low-cost, combining chromatography with mass spectrometry (MS) permits the detection of various analyte classes, e.g., pesticide residues (to check bio-production) [4], sugars (to evaluate quality characteristics towards established regulation, Directive 2001/110/EC) [5] or phenolic compounds (to estimate geographical, botanical or entomological origin) [6,7]. Importantly, phenolic compounds determination in honey authenticity studies can have a binary character, in detail, both as characteristic markers as well as to evaluate the nutritional value of honey due to their bioactivity [8]. Thus, phenolic compound fingerprinting can be used as a tool to investigate both the origin and potential bioactivity of reputable honey types, such as honey produced in Greece.
Greek honey is generally considered of high quality due to its organoleptic characteristics [9], biological activity [10] and the biodiversity of the Greek countryside, which includes many endemic plant species [11]. In addition, Greece plays a decisive role in EU honey production and exportation. Indicatively, Greece exported 856 tons of honey to third countries outside the EU from January 2020 to August 2020, a 79% increase compared to its total exports during the corresponding period in 2019. Moreover, Greece had the most hives per beekeeper (147 hives per beekeeper, while the EU average is 21), indicating honey production's impact on the national economy and agriculture (https://ec.europa.eu/info/sites/default/files/food-farming-fisheries/animals_ and_animal_products/documents/market-presentation-honey_spring2020_en.pdf, last accessed 11 May 2022).
Considering the discussed facts, we analyzed a total of 169 honey samples with different botanical origins, namely blossom (n = 62) and the unifloral fir (n = 10), oak (n = 24), pine (n = 39) and thyme (n = 34) honeys to reveal botanical origin markers of Greek honey through their phenolic compound content. To achieve that, our recently developed ultra-high-performance liquid chromatography hybrid quadrupole time-of-flight mass spectrometry (UHPLC-q-TOF-MS) method [12] was verified, and the analyte target list was updated, including a total of 24 target phenolic compounds. Besides targeted metabolomics, suspect and non-targeted workflows were also utilized. A more comprehensive estimation of the phenolic compound concentration of Greek unifloral honey was accomplished through suspect screening and semi-quantification, while non-target screening provided a diagnostic tool for honey authenticity assessment. All in all, the present study documents the phenolic profile of Greek honey and showcases metabolomic approaches to identify its botanical origin.

Target Screening
The developed target list was used to screen and quantify the content of phenolic compounds in all the tested honey samples (Table 1). Whisker's plots were prepared using the acquired concentrations for all the analytes in every honey matrix to depict the monitored distribution through its quartiles (Figure 1 and Figures S1-S4 in the Supplementary Materials). Importantly, analysis of variance (ANOVA) revealed statistically significant differences (α = 0.05) among varieties for the acquired concentrations of all the target analytes, except eriodictyol, luteolin, quercetin, genistein, rosmarinic acid, vanillin and vanillic acid. Actually, 15 out of 17 analytes showed significant differences with a p-Value <0.001, whilst p-coumaric acid was <0.01 and vanillic acid <0.05 (α = 0.05, in all cases). Following ANOVA, a Tukey's multiple comparison test was performed, revealing four potential botanical origin markers, namely galangin, pinocembrin and chrysin for blossom honey and gallic acid for oak honey (Figure 1). In terms of the blossom honey, a p-Value <0.001 was attained in the case of galangin (Figure 1a), pinocembrin ( Figure 1c) and chrysin (Figure 1d) against the 4 other botanical origins, while the other groups showed non-significant differences among them. The same pattern was monitored for gallic acid in the case of oak honey (Figure 1b). Galangin, pinocembrin and chrysin have been previously reported in high concentrations in blossom honey (also known as polyfloral honey, blossom honey is a mixture of nectar collected by various plants) and were proposed as discrimination markers of Serbian honey [13]. It is important to mention that these three flavones originate from propolis in the case of European honeys [13]. In addition, all three analytes were identified among the main phenolic compounds in heather blossom honey from Portugal [14] and citrus blossom honey from Spain [15], all Mediterranean countries with similar climatological conditions to Greece. In the case of gallic acid in oak honey, our results are in line with previous studies identifying gallic acid among the main compounds of Turkish oak honey [16]. Actually, oak honey has demonstrated bioactivity inhibiting different enzymes, namely urease and xanthine oxidase [17] as well as hyaluronidase [18]. In fact, in these cases, gallic acid is considered responsible for enzyme inhibition due to its high concentration in oak honey. All in all, it was revealed that by using the developed target list, it was feasible to acquire an indication of which analytes are statistically different per botanical origin, indicating their potential as botanical markers. Of course, to verify this argument, it would be necessary to analyze more samples collected during different years.
All identified compounds showed high mass accuracy, below 2 mDa, isotoping fitting below 50 mSigma and retention time (tR) tolerance below 0.2 min. Especially for isokaempferide and tectochrysin, which were not included in our previous suspect screening workflow [12], their identification was accomplished by comparing experimental MS/MS fragments with MS/MS spectra found in the mass spectral library MassBank of North America. Additionally, differences between the experimental and predicted tR were considered. The detailed identification data for isokaempferide is presented below ( Figure 2), while for tectochrysin, it can be found in Figure S5  All identified compounds showed high mass accuracy, below 2 mDa, isotoping fitting below 50 mSigma and retention time (tR) tolerance below 0.2 min. Especially for isokaempferide and tectochrysin, which were not included in our previous suspect screening workflow [12], their identification was accomplished by comparing experimental MS/MS fragments with MS/MS spectra found in the mass spectral library Mass-Bank of North America. Additionally, differences between the experimental and predicted tR were considered. The detailed identification data for isokaempferide is presented below (Figure 2), while for tectochrysin, it can be found in Figure S5   All compounds were semi-quantified using an in-house semi-quantification protocol described in the Section 3. "Materials and Methods". Specifically, 2-trans,4-trans abscisic acid was semi-quantified with its isomer, 2-cis,4-trans-abscisic acid. For the semi-quantification of homogentisic acid, 3,4-dihydroxybenzoic acid was selected. At the same time, apigenin was the most suitable compound to semi-quantify both isokaempferide and isorhamnetin, while lumichrome and methyl syringate were semi-quantified using eriodictyol. Finally, for phenyllactic acid and tectochrysin semi-quantification, p-coumaric acid and chrysin were selected, respectively. Especially for acacetin, 2-cis,4-trans-abscisic acid and sakuranetin, analytical standards were purchased to verify their identification and reduce uncertainty in determining their concentration. All results are presented in Table  2.  All compounds were semi-quantified using an in-house semi-quantification protocol described in the Section 3. "Materials and Methods". Specifically, 2-trans,4-trans abscisic acid was semi-quantified with its isomer, 2-cis,4-trans-abscisic acid. For the semiquantification of homogentisic acid, 3,4-dihydroxybenzoic acid was selected. At the same time, apigenin was the most suitable compound to semi-quantify both isokaempferide and isorhamnetin, while lumichrome and methyl syringate were semi-quantified using eriodictyol. Finally, for phenyllactic acid and tectochrysin semi-quantification, p-coumaric acid and chrysin were selected, respectively. Especially for acacetin, 2-cis,4-trans-abscisic acid and sakuranetin, analytical standards were purchased to verify their identification and reduce uncertainty in determining their concentration. All results are presented in Table 2. According to the results presented in Table 2, the concentrations of some analytes varied significantly in the different floral varieties. Oak honey was richer in abscisic acid isomers compared to the other varieties. Abscisic acid has been characterized as a main phenolic compound of honeydew honeys [19]. In our study, the greatest amount of 2-cis,4-trans-abscisic acid was found in oak honey, followed by fir honey, with a mean concentration of 1.4 and 0.46 mg/Kg, respectively. These results agree with other studies reporting similar concentration levels for these varieties [20]. Greek thyme honey was richer in methyl syringate than blossom, oak, fir and pine honeys, with methyl syringate being a phenolic compound related to the scavenging activity of superoxides [21]. Methyl syringate was reported in thyme honeys previously [22]. Oak honey was also rich in phenyllactic acid and homogentisic acid, compounds that were previously used for honey authentication [23].
All the other suspect compounds were determined in low concentration levels and did not show any potential for discrimination among varieties. Sakuranetin was not identified in pine and fir honeys, while its average content was also low in oak, blossom and thyme honeys. These results are in accordance with previously published studies, with honeydew honeys from New Zealand containing 0.02 mg/Kg and different nectar honeys between 0.0060 and 0.062 mg/Kg [24,25]. Acacetin was not identified in thyme and fir samples; however, it was present in the other three varieties. An average concentration of 0.053 mg/Kg was detected in Greek oak honeys, comparable to Turkish honeys previously studied (0.04 mg/Kg).
Dehydrovomifoliol has also been formerly identified in honey. It has been proposed as a potential marker of heather honey, especially Polish honey [12]. Oak honey was the richest in our results, reaching 1.8 mg/Kg as an average concentration. Another detected suspect compound was isorhamnetin, with an average concentration ranging from 0.033 mg/Kg in thyme samples to 0.11 mg/Kg in oak variety, which was comparable to previous studies [26]. Regarding lumichrome, the highest average concentration was found in blossom honeys; nevertheless, this concentration of 0.33 mg/Kg is significantly lower than the corresponding concentrations reported in Croatian and Italian honey samples [27]. Finally, tectochrysin was identified as a minor constituent in Greek honeys. Tectochrysin has been reported previously in Spanish floral honeys from Galicia with an average concentration of 0.31 mg/Kg, significantly higher than the one calculated in Greek honeys [28].

Non-Target Screening
Applying non-targeted screening workflow in honey samples resulted in a bucket table containing 3631 features. Worthwhile to note is that samples collected only during 2016 were processed by the non-targeted screening workflow. Multivariate statistical analysis revealed important features that can be used as authenticity markers to discriminate Greek honey varieties. Score plots, as well as the goodness of fit (R 2 ), the goodness of prediction (Q 2 ) and model accuracy of each PLS-DA model, can be found in the Supplementary Materials, along with the five most significant variables of each model and their box plots ( Figures S6-S26).
Initially, a PLS-DA model was constructed, attempting to differentiate all 5 honey varieties. Figure S6 shows the final PLS-DA model developed, and the ellipses are drawn at 95% probability level. The score plot shows a distinct separation between blossom and thyme honeys. On the other hand, all samples belonging to the honeydew class (fir, oak and pine) were not well-separated and were distributed between the blossom and thyme groups ( Figure S6). This model did not prove to be accurate, although it showed a predictive ability of 0.59 using the first four components. Many factors that increase the variance in the sample set may have influenced model accuracy, such as the different regions from where the samples were collected, the various climatic conditions prevailing in different collecting periods, and the diverse flora occurring in each area [29]. A better separation could have been achieved by using physicochemical parameters and/or melissopalynological analysis as a preliminary step to remove samples that do not comply with specific parameters defined by the legislation (Directive 2001/10/EC). The top five important features with the highest VIP scores were those with monoisotopic masses of 284.0684, 284.1048, 242.1517, 166.0996 and 508.1157 ( Figure S7). Their box plots are found in the Supplementary Materials ( Figure S8).
Subsequently, identification of the VIPs was undertaken. Indicatively, the following process was applied for the mass feature with m/z 284.1048 in tR = 9.75 min: First, the molecular formula C 17 H 16 O 4 was assigned by Smart Formula manually 3D with a 100% score, as it showed a mass error of 0.22 mDa and an isotopic fitting of 27 mSigma. Then, the Compound Crawler tool was used to search public libraries such as ChEBI, Pubchem and ChemSpider. ChEBI library was preferred compared to the others as it incorporates chemical entities of biological interest. However, no candidates were recovered in the ChEBI library, so Pubchem and Chemspider were used, resulting in 17 possible candidates. The candidates were then processed with the in silico fragmentation tool MetFrag using the MS/MS spectra that have been assigned in this feature during data treatment. Metafrag showed the highest score for the candidate phenethyl caffeate, a phenethyl alcohol ester of caffeic acid, which constitutes a bioactive component of honeybee hive propolis, providing many beneficial properties [30].  Table S7) and [C 9 H 5 O 3 ] − were assigned. As the last step, mass spectral libraries such as MassBank Europe, MassBank of North America and METLIN were searched to find an MS/MS spectrum of a reference standard to reach a higher confidence level in the identification, but it was not available in this case.
Following the procedure mentioned above, the compound acacetin was assigned in the mass feature of 284.0684 and confirmed by a reference standard. In addition, acacetin was also identified through the suspect screening procedure, and it showed a higher average concentration in blossom honey, in line also with non-target screening findings. Moreover, the mass feature with m/z 508.1157 corresponds to the [2M − H] − ion of chrysin, an important compound already detected and quantified through the targeted screening.
Finally, the mass features with m/z 242.1517 and 166.0996, which show a higher intensity in thyme honey, are in-source fragments of the m/z 301.1656, as proposed by metaboscape ( Figure S27). The molecular formula C 15 H 26 O 6 was assigned to the mass feature m/z 301.1656. A total of 2232 candidates were found in PubChem based on the given molecular formula, and further information (such as intense and clean MS/MS fragments) was needed to proceed to higher identification confidence. Therefore, this mass is tentatively identified at identification confidence level 4, according to the categorization proposed previously [12]. All annotations of important features of all PLS-DA models are presented in Table 3.
A new model was created after removing the samples belonging to the honeydew class in order to investigate the discrimination between blossom and thyme varieties. Thyme, one of the most widespread and important Greek unifloral honeys, is usually degraded by adding, without declaration, other nectar honeys or blossom honey and mislabeled as thyme honey [31]. Thus, it is of paramount importance to create a model to certify its authenticity. A highly accurate model leading to R 2 = 0.99587 and Q 2 = 0.81258 was achieved using the first five components ( Figure S9). The first five variables in descending order were 508.1157, 544.1364, 284.0684, 286.0841 and 314.0790, which shows a higher concentration level in blossom honeys ( Figure S10). Their box plots can be found in the Supplementary Materials ( Figure S11 (Table S7). Pinobanksin constitutes a significant phenolic compound found in different honey matrices and provides many beneficial properties to human health [12]. As revealed in the targeted screening results, pinobanksin showed a much higher average concentration in blossom honeys and can be considered a potential biomarker for this variety. Finally, for the mass feature with m/z 286.0841, the molecular formula C 16 H 14 O 5 was assigned; however, this led to many possible candidates, making its unequivocal identification very difficult.
Since it was not feasible to attain sample discrimination using all five different honey classes, in the next step, we worked with binary discriminatory models to reveal significant compounds found in these varieties. Thus, two classes were formed in each model. The first one included the samples from a selected variety, while the second contained the rest of the samples. So, five models were developed. PLS-DA score plots, cross-validation details, important features, and their box plots are presented in the Supplementary Materials ( Figure S12-S26). The most satisfactory results in terms of accuracy and predictive ability were obtained in the separation models of thyme and blossom categories with Q 2 = 0.82468 and Q 2 = 74198, respectively. On the other hand, fir, pine and oak models showed lower predictive ability ranging from 0.58 to 0.64. Honeydew honeys are often more difficult to differentiate as their composition depends mainly on the beekeeping plants that exist in the forests of each region. Therefore, honeydew honey may contain a high portion of nectar, and its composition must be checked so that it complies with the EU legislation (Directive 2001/10/EC).
In the discrimination model of blossom honeys, two important markers were identified by comparing their MS/MS spectra with those obtained by in silico fragmentation. The first marker was phenethyl caffeate, a compound that has already been mentioned before as a significant feature of Greek blossom honey [30]. The second marker, prenyl caffeate, is also a caffeic acid derivative, and it has been reported in Serbian polyfloral [13] samples and Algerian honeys [29]. Other markers could not be identified as many possible structures were retrieved for their assigned molecular formula. Regarding the thyme discrimination model, eudesmic acid was recognized by comparing experimental MS/MS data with those in the MoNA spectral library, while 3-methylgalangin with in silico MS/MS data produced by Metfrag. Eudesmic acid is cited as an important compound for Manuka honey, while 8-methylgalangin has been reported for Chilean honeys [32,33]. The oak model revealed three important markers. As previously mentioned, gallic acid is a marker for oak honey samples and was identified with a reference standard. Scopoletin has formerly been reported in cotton honey [34], and its identity was confirmed by comparison with the MoNA library. Finally, taxifolin proved to be an important compound for Greek pine honeys, confirmed by target screening results.

Chemicals
All chemicals, solvents and reagents used were of high analytical purity. Ammonium acetate, sodium sulfate anhydrous and EtAc (purity 99.0% or greater) were purchased from Sigma-Aldrich, while ACN and LC-MS grade methanol (MeOH) were provided by Merck.

Standard Preparation
A 1000 mg L −1 in MeOH stock solution was prepared for each target compound. Then, the standards were stored at −20 • C in amber glass bottles to avoid photodegradation. Two mixture working solutions at two different concentrations, 25 and 50 mg L −1 , were also prepared and stored in the refrigerator. Dilution of the stock solution with mobile phase, MeOH: H 2 O (50:50), yielded working solutions at 0.25, 0.50, 1.0, 2.0, and 5.0 mg L −1 . Calibration curves were obtained by plotting the peak areas of the standards against their concentration. Similar calibration curves were constructed in a blank honey matrix (matrix-matched calibration curve) to assess important method performance characteristics, namely, linearity, precision and matrix effects as well as analyte quantification. The matrixmatched standards were prepared by spiking the compounds in a blank honey extract prior to injection.

Honey Samples and Sample Preparation
One hundred and sixty-nine (n = 169) Greek honey samples from blossom (n = 62), fir (n = 10), oak (n = 24), pine (n = 39) and thyme (n = 34) varieties were collected by ATTIKI Honey SA from different regions of Greece (see supplementary material for further details, Table S2). Importantly, the samples were collected and analyzed during two consecutive years (2016 and 2017) to enhance the variability of the tested parameters. The samples were stored in amber glass containers at 4 • C. Before analysis, the samples were mixed vigorously for 3 minutes to be homogenized and in case of crystallization, samples were put in a water bath at 40 • C till they were liquified. Sample preparation was based on a previously published study of our group [12]. Briefly, 1 g of honey sample was weighted in a 15 mL centrifuge tube and diluted with 5 mL of acidified water (pH < 2) containing 2% sodium chloride. After vortexing for 1 min, the samples were extracted three times with 5 mL EtAc. The samples were centrifuged between each extraction step to better separate the two phases. The combined extracts were collected in a glass tube and dried with anhydrous sodium sulfate. Afterwards, the extracts were evaporated under a gentle nitrogen stream to dryness and then reconstituted to 0.2 mL with a final proportion of MeOH:H 2 O (50:50). Before being injected into the HPLC system, all samples were filtered using cellulose syringe filters (R.C. filters, pore size 0.2 µm, diameter 15mm). In addition, to assess potential drifts and evaluate the reproduction of the analysis, a quality control (QC) sample was prepared by mixing 20 µL of each honey sample extract to attain analytical information from all different botanical and geographical origins. Finally, ultrapure water was used to prepare a procedural blank, subjecting the whole sample preparation protocol to subtract possible contamination during data processing.

Method Verification
To monitor the analytical performance of the method, all the necessary quality performance characteristics were investigated, namely trueness, repeatability, intermediate precision, selectivity, linearity, limits of detection (LODs) and limit of quantification (LOQs). Considering that the current study is based on an in-house validated method [12], on this occasion, method verification was performed and seven additional analytes were included in the target list, namely, chrysin, galangin, genistein, naringenin, pinobanksin, pinocembrin and rosmarinic acid, resulting in a target list of 24 analytes. All the investigated performance characteristics were calculated as described in our previous paper [12], and detailed verification results can be found in the (Supplementary Material Tables S3-S5).  Table S6 of the Supplementary Materials. The injection volume was set to 5 µL. The QToF-MS system was equipped with an ESI source, operating in negative ionization mode. The operation parameters for ESI were set as follows: capillary voltage, 3500 V; endplate offset, 500 V; nebulizer gas pressure 2 bar (N2); drying gas, 8 L min −1 and dry temperature, 200 • C.

UPLC-QToF-MS Analysis
The QTOF-MS system was operating in broadband collision-induced dissociation (bbCID) acquisition mode and recorded spectra over the m/z range 50-1000 with a scan rate of 2 Hz. The Bruker bbCID mode is a data-independent acquisition mode (DIA) that provides MS and MS/MS spectra at the same time, working at two different collision energies; at low collision energy (4 eV), MS spectra are acquired, while at high collision energy (25 eV), MS/MS spectra are collected. In addition, honey samples from each botanical and geographical origin, as well as the pool QC sample, were also analyzed using a data-dependent acquisition mode (DDA), AutoMS. In AutoMS, the five most abundant ions per MS scan are selected and fragmented, providing precise and compound-specific MS/MS spectra. Thus, this mode is most suitable for the structure elucidation of unknowns.
A QTOF-MS external calibration was performed before analysis with a 10 mM sodium formate solution in a mixture of water/isopropanol (50:50). The exact theoretical masses of calibration ions with formulas HCOO(NaCOOH)1-14 in the range of 50-1000 Da were used for calibration. Internal calibration was also performed using a calibrant injection at the beginning of each run in a dedicated calibration segment (0.1-0.25 min).

Targeted, Suspect and Non-Targeted Screening Workflows
An accurate-mass target screening database was compiled and used to identify and quantify 24 phenolic compounds in all honey samples (see Supplementary Materials,  Table S7). Briefly, the identification criteria were as follows: retention time tolerance lower than ±0.2 min, mass accuracy of the precursor and qualifier ions less than 5 mDa, isotopic fit less or equal than 50 mSigma (Bruker mSigma is a measure of the goodness of fit between the measured and the theoretical isotopic pattern) and the existence of at least two qualifier ions. The target screening was performed using the software TASQ 1.4 and DataAnalysis 4.4 (Bruker Daltonics, Bremen, Germany) along with other tools included in this software, such as Bruker Compass Isotope Pattern and SmartFormula Manually. Instead of external standard calibration curves for analyte quantification, matrix-matched calibration curves were used to compensate for ion suppression or enhancement in the ionization source. To identify statistically significant differences among the five botanical groups based on the attained concentration of each target analyte, ANOVA followed by Tukey's multiple comparison test was performed at a significance level, α = 0.05 using GraphPad prism 5.0 software (San Diego, CA, USA). Additionally, to visualize the variance of the attained concentration for each target analyte, Whisker's plots are provided, developed in GraphPad prism 5.0 software.
The suspect screening workflow is thoroughly described in a previously published work [12]. Firstly, eight compounds contained in the suspect list from our previous work were bought and incorporated into the target list. Then, the suspect list was enlarged by adding eight more compounds, namely, isokaempferide, caffeic acid isoprenyl ester, (−)-epigallocatechin gallate, 4-hydroxyphenylacetic acid, arbutin, baicalein, astragalin and kynurenic acid that have been mentioned to exist in honey. Information about formulas, monoisotopic masses and pseudomolecular ions, possible fragment and adduct ions as well as the predicted retention times [35] for these specific compounds are meticulously presented in Supplementary Materials (Table S8, [12,22,[36][37][38][39][40][41]). Furthermore, the experimental retention times for the compounds that have been previously identified have also been added to the suspect list and used to increase identification confidence. Then, the final list, containing 60 compounds, was used to screen the honey samples to gain a deeper knowledge of phenolic compounds in Greek varieties. Mass accuracy threshold of 5 mDa, isotopic fit below or equal to 50 mSigma, ion intensity more than 1000 and peak area threshold of more than 2000 were utilized for creating the extracted ion chromatogram. For this purpose, the program TASQ 1.4 (Bruker Daltonics, Bremen, Germany) was used.
The identification workflow was followed for all compounds found in the suspect list. Briefly, MS spectra were meticulously examined for the existence of possible in-source fragments and/or adducts, and a formula for the precursor ion was proposed from Smart Formula manually. Then, MS/MS spectra were compared to those that exist in mass spectral libraries such as Fiehn Lab MassBank of North America (https://mona.fiehnlab. ucdavis.edu/, last accessed 8 June 2022) and METLIN [42] or by in silico fragmentation tools such as MetFrag [43]. Furthermore, experimental retention times were compared with theoretical to distinguish potential isomers and eliminate false positive results [12].
The identified suspect compounds were then semi-quantified in order to estimate their concentration levels in Greek honey samples. A popular way to perform semi-quantification is to use similar structural compounds found in the target list; however, this is not the most suitable method as important parameters such as logD and RT are not considered [44]. A more accurate way is to search for structurally similar compounds based on the number of similar functional groups, as well as the distance between functional groups. The online tool, ChemMine (https://chemminetools.ucr.edu/, last accessed 8 June 2022) was used for this goal. The Smile of each suspect compound was imported, and the comparison with compounds found in the PubChem database ensued, using a similarity cutoff of 0.9. The analytes with the highest structural similarity were recorded and the target compound with the higher similarity score was selected for semi-quantification. The similarity score is calculated as the Tanimoto similarity of substructure-based fingerprints [44].
Non-target screening workflow was performed using Bruker Metaboscape 3.0, an integrated software capable of implementing the entire procedure from peak peaking to multivariate analysis. The workflow contains automatic calibration using a calibrant (sodium formate solution) injection at the start of each run, as well as non-linear retention time alignment using T-ReX 3D, which connects isotopes, adducts and fragments of each feature together. The parameters used for bucket table formation were as follows: intensity threshold higher than 2000 counts, minimum peak length of 8 spectra, a mass range of 50-1000 m/z, Rt range of 0.5-12 min, an extracted ion chromatograms (EIC) correlation of 0.8 (only features above this threshold can be treated as adducts or fragments), primary ion [M − H] − , seed ion [M + Cl] − and common Ion [M − H − H 2 O] − . Background features were discarded by subtracting procedural blank runs. Furthermore, if a feature is not found in at least 25% of the samples of a group is also removed [45].
Multivariate statistical analysis was performed to spot new markers that can differentiate the samples according to their botanical origin. The bucket table produced by Metaboscape, after blank buckets removal, was exported as a csv file. Then, multivariate statistical analysis was performed by importing this file into the online platform Metaboanalyst 5.0, a dedicated platform for metabolomics applications. Firstly, the group labels were added to all samples, and missing values were replaced by 1/5 of min positive values of their corresponding variables. Data filtering was not used, and the bucket table was normalized by sum to reduce systematic variation among samples. In addition, data were transformed using the logarithm function and scaled using the auto-scaling algorithm, which mean-centered the data and divided the standard deviation of each variable. Finally, Partial Least Squares Discriminant Analysis (PLS-DA) was used to discriminate the different varieties and obtain the most influential variables that can be used as markers. Thus, the features with the 5 higher VIP scores of the PLS-DA model were kept and further examined to be identified. Ten-fold cross-validation (CV) was performed where the whole dataset was divided into 10 parts, 9 parts were used for training the model, and 1 part was kept as a test set. This procedure happens 10 times, and the total error is the mean of the errors produced after each test. As a model performance measure, the Q 2 was used, which is the model's predictive ability.
Complementary to the previous model, 5 PLS-DA two-class models were created, each one referring to the discrimination of a class compared to the rest of the samples. These models were built to find important variety-specific biomarkers that discriminate each variety. The workflow was similar to that described above and included one more step. Before obtaining the csv file used for building each model and importing it in Metaboanalyst 5.0, a t-test significance test was implemented in Metaboscape 3.0 to keep only statistically significant variables with p < 0.05. Thus, a file containing the intensities of all samples for the selected buckets was acquired, and the multivariate statistical analysis ensued.
The identification of the above-mentioned important features was based on the interpretation of MS and MS/MS data, with Metaboscape software providing all the necessary tools. Firstly, the target list containing information about standard compounds was imported into Metaboscape as an analyte list, helping to identify some buckets based on specific criteria such as mass accuracy, isotopic fitting, tR tolerance and existence of fragments ions. Another vital tool was Smart Formula manually 3D, with which molecular Formulas were assigned to possible markers in terms of mass accuracy and isotoping fitting. After molecular formula assignment, potential compounds were retrieved using Compound Crawler by public libraries such as PubChem and ChemSpider. Finally, a comparison between experimental MS/MS data with in silico fragmentation produced by Metrfag ensued. Spectral libraries such as Mass Bank Europe, Massbank of North America and Metlin were used to reach a higher level of confidence in the identification.

Conclusions
The application of the various HRMS metabolomic approaches provided successful discrimination of Greek honey from five different botanical sources. Targeted, suspect and non-targeted workflows were utilized, providing fruitful information on the Greek honey polyphenolic compound profile. In terms of targeted analysis, it was proven that potential botanical origin markers could be attained even by using univariate analysis, i.e., ANOVA. Nevertheless, analytical standards are used to quantify the targeted analytes, increasing the method cost. Suspect screening can be used complementary to targeted analysis to acquire a detailed characterization of the honey metabolomic profile. Semiquantification is also possible by using a few analytical standards based on the structural similarity of the analytes. In contrast to the previous two cases, the non-targeted analysis did not require any analytical standards, and further enhanced sample clustering based on advanced chemometrics. Overall, the present study provides a wealth of knowledge on the metabolomic composition of Greek honey, and the proposed markers may be used to make national honey production more secure.