1. Introduction
The olive (
Olea europaea L.) tree is a typical Mediterranean cultivation, with more than 750 million plants cultivated over ten million hectares in the Mediterranean basin alone, accounting for more than 95% of the worldwide production of table olives and olive oil [
1,
2,
3]. Virgin olive oil (VOO) is the oil derived from the fruits of
Olea europaea L. without the use of solvents, using mechanical processes exclusively [
4], while extra virgin olive oil (EVOO) is its superior quality segment, characterized by a rich flavor profile and various culinary applications [
5]. Moreover, EVOO consumption is associated with well-investigated health benefits, which are mainly due to its high oleic acid content and to the presence of phenolic and secoiridoid derivatives [
6,
7,
8]. These facts have led to a constant increase in worldwide EVOO production in recent decades, both in the Mediterranean region and in countries where the olive tree is not traditionally cultivated, such as Chile, Australia, and the USA [
1,
9].
The increase in the worldwide cultivation of olive trees can potentially lead to relevant environmental issues; in fact, VOO production entails the formation of a considerable quantity of by-products, such as olive mill wastewater, olive pomace, and pruning residues [
10,
11,
12]. These waste biomasses represent a problem due to their high organic load, which makes them highly polluting for the environment, but can also be seen as an underexploited resource, since they often contain high concentrations of health-beneficial compounds similar to those that can be found in EVOO [
10,
13,
14]. Several approaches have been proposed for the recovery of these bioactive compounds in order to improve the sustainability of the VOO production chain; olive by-products have found application as cattle feed, fertilizers, and novel materials, and their extracts in the pharmaceutical, food, and cosmetic sectors [
15,
16,
17]. In the food sector, extracts can be upcycled as ingredients for the formulation of phytochemical food supplements, or as additives to improve the stability and the nutritional value of food [
18,
19,
20,
21].
Olive leaves have been used for centuries in folk medicine [
22] and are nowadays included in the European Medicinal Agency monographs as well as in the European Pharmacopoeia [
23,
24]. The bioactivity of olive leaf extracts (OLEs) is due to the presence of bioactive compounds with different chemical nature: phenolic derivatives like verbascoside and hydroxytyrosol, flavonoid derivatives like rutin and luteolin-7-
O-glucoside, and triterpenic acids such as oleanolic acid. Secoiridoid compounds, among which oleuropein is the most abundant in the plant, are characteristic of the Oleaceae botanic family [
25,
26]. High and consistent concentrations of bioactive derivatives are requested in OLEs for their application in the nutraceutical sector; since the abundance of secondary metabolites in the plant may vary according to many different factors such as cultivar, phenological phase, and pedoclimatic conditions, it is crucial to consider the effects of these variables on the composition of the extracts [
25,
27,
28].
Focusing on phenolic compounds, the analytical technique most used for their profiling is given by High-Performance Liquid Chromatography with Diode Array and Mass Spectrometry Detector (HPLC-DAD-MS), enabling identification and quantitation of many of the compounds present in the leaf [
27,
29,
30,
31,
32,
33,
34,
35].
The effect of cultivar on the phenolic composition of
Olea europaea L. by-products is particularly important to study; olive groves of recent implantation are typically adapted to intensive and mechanized agriculture, which often involves the use of a single olive cultivar. Therefore, monovarietal EVOO is not only a fast-growing niche product in the context of high-quality oils [
36,
37] but it is also produced in large quantities for the formulation of industrial EVOO blends. Thus, the selection of the best cultivars for the upcycling of olive leaves into nutraceuticals is crucial for an efficient and effective exploitation of this waste biomass [
12]; moreover, it has been demonstrated that, in olive leaves, seasonal variations in the concentrations of bioactive compounds are cultivar-dependent [
28].
Numerous studies in recent scientific literature have already investigated the seasonal variations of local olive cultivars [
38]. A common limitation of these studies is the diverse geographical origin of the olive leaves samples from different cultivars; recent studies have in fact demonstrated that minor differences (even less than 50 km) in the harvesting location of olive leaves can cause significant variations in the phenolic composition of extracts obtained from leaves of the same cultivar [
35,
39]. Furthermore, little research in the literature has studied the variation in phenolic profile of monocultivar olive leaves, and, to the authors’ knowledge, no study has studied this in olive leaves of the main Tuscan cultivar (i.e.,
Frantoio,
Moraiolo,
Leccino, and
Leccio del Corno) across seasons over years.
The aim of the present research was to study the variation in phenolic profile in olive leaves collected from four different typical Tuscan cultivars across four phenological phases over two years. The employed sampling approach effectively eliminated the geographical and pedoclimatic factors, allowing a precise assessment of the influence of cultivar on OLE composition, as well as on the interactions between cultivar and season of harvest. To this aim, olive leaves of the Frantoio, Moraiolo, Leccino, and Leccio del Corno cultivars were collected from trees all located in the same experimental field in the province of Florence. Leaf samples were gathered over a two-year timespan and across the four phenological seasons. After drying, leaves were subjected to ultrasonic-assisted hydroalcoholic extraction, and the composition of the extracts was characterized by HPLC-DAD-MS. A statistical approach involving 3-way ANOVA, Genetic Algorithm with Linear Discriminant Analysis (GA-LDA), and Random Forest (RF) methods was applied to study any correlation among phenolic compounds and the different sources of variability (i.e., year and cultivar season).
2. Materials and Methods
2.1. Chemicals
Ultrapure water was both generated using a Milli-Q system (Millipore S.A., Molsheim, France) and supplied by an Elga Purelab Option R system (Veolia Environnement S.A., Paris, France). HPLC-grade formic acid, methanol, and acetonitrile, as well as analytical-grade methanol and ethanol, and a tyrosol standard were obtained from Merck (Darmstadt, Germany).
2.2. Collection and Drying of Olive Leaf Samples
Olive leaves were harvested from rural areas surrounding Florence (Tuscany and Italy), from trees belonging to widely cultivated and traditional Tuscan cultivars, namely Frantoio, Leccio del Corno, Moraiolo, and Leccino. These cultivars are renowned for their characteristic features and their significant role in high-quality extra virgin olive oil production. Sampling was carried out over a two-year period, from April 2022 to February 2024, once in each season (spring, summer, autumn, and winter, corresponding with different phenological phases of the plant, namely spring vegetative growth, summer heat and water stress, autumn regrowth, and winter dormancy, respectively) on three different trees per cultivar. The leaves from each set of three trees were pooled to obtain a representative sample. Leaf collection was performed uniformly around the entire circumference of each tree, selecting leaves from the central part of the shoots to ensure homogeneity in age, avoiding both very young and very old leaves. The trees were non-irrigated and managed under organic farming. The sampling strategy was designed to assess the influence of cultivar, year, and seasonal variation on the quali-quantitative phenolic profile of olive leaves.
All samples were collected in triplicate, yielding a total of 96 samples (4 cultivars × 4 seasons × 2 years × 3 replicates). Seasonal collections were scheduled as follows: April (spring), July (summer), November (autumn), and February (winter). To minimize variability due to soil and climatic conditions, all samples were gathered from the same experimental olive orchard (situated at 43.8359 N, 11.2066 E) and collected on the same day for each designed sampling period. Fresh leaves were oven-dried at 80 °C using a Termaks TS 8430 (Bergen, Norway) oven, until reaching a constant weight. The dried material was then ground and homogenized using an IKA M 20 Universal mill (IKA-Werke GmbH & Co. KG, Staufen im Breisgau, Germany). The samples collected are listed in
Table 1.
2.3. HPLC-DAD-MS Analysis of Olive Leaf Phenolic Compounds
Approximately 1 g of each powdered leaf sample was precisely weighed and subjected to two sequential extractions with 30 mL of EtOH:H2O (80:20, v/v), employing an IKA T 25 digital Ultraturrax followed by ultrasonic bath treatment. After centrifugation at 5000 rpm for 10 min at 0 °C, the supernatants were defatted twice using 30 mL of hexane. The solvents were subsequently removed under vacuum. The resulting dried extracts were reconstituted in 5 mL of MeOH:H2O (50:50, v/v), then centrifuged at 14,000 rpm for 5 min. The supernatant was used for HPLC analysis.
For the chromatographic analyses, a 1260 Infinity II LC system coupled with a DAD and a Mass Spectrometry Detector (InfinityLab LC/MSD) using an API/ESI interface (Agilent Technologies, Palo Alto, CA, USA) was employed. Chromatographic separation was performed on a Poroshell 120 EC-C18 column (150 mm × 3.0 mm, 2.7 µm particle size; Agilent, Palo Alto, CA, USA). The mobile phase consisted of HPLC-grade acetonitrile (solvent A) and ultrapure water acidified with 0.1% formic acid (solvent B). A linear multistep gradient elution was applied at a constant flow rate of 0.4 mL/min. The gradient profile was as follows: solvent B decreased from 95% to 60% over 40 min, held at 60% for 5 min, reduced to 0% over the next 5 min, maintained at 0% for 3 min, and then returned to 95% within 2 min, making the total run time 55 min. The injection volume was 2 µL, and the system was equilibrated for 10 min prior to each run. UV-Vis spectra were recorded from 200 to 550 nm, with chromatograms monitored at 210, 220, 240, 280, and 350 nm.
Quantification was carried out using external standard calibration curves built with reference standards where available; compounds for which identical standards were not available were quantified as equivalents, using standards from the same chemical class, as follows:
Tyrosol (λ = 280 nm; calibration range 0–8.2 µg; R2 = 0.992) was used for the determination of hydroxytyrosol and its derivatives.
Rutin (λ = 350 nm; calibration range 0–2.2 µg; R2 = 1) served as the standard for rutin quantification.
Luteolin-7-O-glucoside (λ = 280 nm; calibration range 0–2.8 µg; R2 = 0.9975) was employed for the quantification of other flavonoid compounds.
Oleuropein (λ = 280; calibration range 0–18 µg; R2 = 0.9997) was used for quantification of both oleuropein and secoiridoid derivatives.
Verbascoside (λ = 280 nm; calibration range 0–0.8 µg; R2 = 0.9997) was selected for the quantification of phenolic acids and their derivatives.
2.4. Data Analysis
The mean and standard deviation (SD) of triplicate measurements from all experiments were computed using custom routines developed in Microsoft EXCEL (version 365). A three-way ANOVA was carried out for each parameter to assess the influence of cultivar, year, season, and their interactions. To determine statistically significant differences among sample data, post hoc analyses were performed using Tukey’s multiple comparison test. All ANOVA and Tukey tests were conducted using R software (version 4.3.1, The R Foundation for Statistical Computing, Vienna, Austria).
The capability of the chemical variables to correctly classify samples according to either cultivar, year, or season was assessed using two supervised classification methods: GA-LDA and Random Forest (RF). Specifically, each of the two methods was applied three times, focusing separately on one of the three sources of variation: cultivar (with classes given by the cultivar Frantoio, Leccino, Leccio del Corno, and Moraiolo), year (classes year 1 and year 2), and season (classes, spring, summer, autumn, and winter).
GA-LDA: Linear Discriminant Analysis (LDA), combined with a Genetic Algorithm (GA) for variable selection [
40], was employed to build a discriminant function capable of maximizing class separation based on a selected subset of variables. The GA, inspired by the principles of natural selection, is particularly effective for handling datasets with numerous and heterogeneous variables across multiple samples, as in the case of this study [
41]. It begins with a randomly generated population of variable sets (individuals), which are evaluated and ranked. Through successive iterations, the algorithm evolves the population by selecting high-performing individuals based on a fitness function—in this case, the LDA—and applying genetic operators such as crossover and mutation to produce new combinations. This evolutionary process continues over several generations to improve solution quality. LDA served as the fitness function, assessing each variable combination based on the Class Error Average (i.e., the mean percentage of misclassified samples across all classes). This metric was derived from a confusion matrix obtained through 5-fold cross-validation. Overall classification accuracy was also calculated. The parameters used in the GA were as follows:
Number of Variables per Individual: 8.
Population Size: 100.
Number of Generations: 25.
Number of Elite Individuals: 50.
Crossover Probability: 1 (applied to all elite individuals).
Mutation Probability: Random.
RF: The Random Forest algorithm is a machine learning method that constructs multiple decision trees using random subsets of data and variables—a technique known as bagging [
42]. The final classification result is derived by aggregating the predictions of all individual trees, which reduces model variance and improves accuracy. In this study, the forest consisted of 500 trees, and model performance was optimized by minimizing the Out-of-Bag (OOB) error. A 5-fold cross-validation procedure was used to produce the confusion matrix
The importance of variables was evaluated. In the case of the GA-LDA method, it was the frequence (%) with which variables are selected by the GA in the best ten combinations per run. In the case of the RF method, it was the number of times (expressed in %) each variable is selected by each single tree.
All data analyses were conducted using R software (version 4.3.1, The R Foundation for Statistical Computing).
3. Results and Discussion
A typical chromatogram of the phenolic profile of olive leaf extracts (OLE) is reported in
Figure 1A, with a zoom of the central part (rich of peaks very close to each other) following in
Figure 1B. The main peaks are numbered according to the assignment summarized in
Table 2. Compound identification was carried out using data of retention time, UV and Mass spectra, and literature information [
29,
30,
43,
44,
45].
Table 2 lists the assignment of 25 phenolic and secoiridoid molecules. They include (i) the two main
Olea europaea L. secoiridoids (i.e., oleuropein (confirmed with standard) and ligstroside); (ii) eight oleuropein-related compounds (i.e., hydroxyoleuropein isomers 1 and 2, oleuropein diglucoside, methoxyoleuropein isomers 1 and 2, dehydrooleuropein isomers 1 and 2, and oleuroside); (iii) three hydroxytyrosol-related molecules (i.e., hydroxytyrosol–glucoside isomers 1 and 2, and hydroxytyrosol (confirmed with standard)); (iv) three derivatives of phenolic acids (with
m/
z of 379, 489, and 519, respectively); and (v) nine flavonoids (i.e., luteolin diglucoside, rutin (confirmed with standard), luteolin rutinoside, luteolin-7-
O-glucoside (confirmed with standard), flavonoid with
m/
z 505, apigenin rutinoside, luteolin-4-
O-glucoside, flavonoid with
m/
z 553, and luteolin glucoside isomer). To the best of the authors’ knowledge, this is the first article reporting the presence of dehydrooleuropein isomers (peaks 23 and 24) in olive leaves. The presence of its aglycone was previously reported in extra virgin olive oil [
46,
47], but the position of the additional unsaturation was not specified. Given the complex and double-bonds-rich structure of oleuropein, a possible position for the new double bond could be the carbon chain of the phenylethyl portion of the molecule. This hypothesis was supported by the presence of a fragment with
m/
z 403 in the mass spectra of the two isomers, corresponding to an ionized elenolic acid methyl ester. The loss of 403 units from an initial molecular weight of 538 would in fact result in a fragment with 135 atomic units of weight, corresponding to a dehydrohydroxytyrosyl moiety.
All data of both single phenolic compounds and total phenolic content in all olive leaf samples analyzed are reported in
Table 3, as means of triplicates and as mg/kg of leaf dry weight. An average total phenolic content of 30,353.9 mg/kg was observed, with minimum and maximum values of 16,674.0 and 50,594.3 mg/kg for the MorY1Sp and LdCY2Au samples, respectively. As expected [
32,
35,
48], oleuropein was the most abundant phenolic compound, with a minimum average value of 4570.0 mg/kg and a maximum average value of 27,547.7 mg/kg for the MorY1Sp and LdCY2Au samples, respectively. Ligstroside, the other typical secoiridoid of
Olea europaea L. [
49], was present in contents much lower than oleuropein, ranging from 117.0 mg/kg to 1931.0 mg/kg, with a mean value of 820.8 mg/kg. After oleuropein, the compounds with the highest concentrations were the flavonoids luteolin-7-
O-glucoside, with an average content of 3005.1 mg/kg and values ranging from 1735.7 mg/kg to 6466.0 mg/kg, and luteolin-4-
O-glucoside, with an average content of 2941.8 mg/kg and values ranging from 1210.7 mg/kg to 6309.3 mg/kg. These were followed by hydroxytyrosol glucoside isomer 1, with an average content of 1923.8 mg/kg and values ranging from 196.3 mg/kg to 6777.7 mg/kg, hydroxyoleuropein isomer 2, with an average content of 1323.8 mg/kg and values ranging from 573.3 mg/kg to 2635.0 mg/kg, and hydroxyoleuropein isomer 1, with an average content of 1135.5 mg/kg and values ranging from 121.7 mg/kg to 2021.3 mg/kg. All the other molecules were present with average values lower than 1000 mg/kg, including hydroxytyrosol with an average content of 261.8 mg/kg and values ranging from 118.0 mg/kg to 502.7 mg/kg.
The values hereby presented are in general agreement with the literature data reported for the same cultivars, with the additional feature of their consistency over a two-year timespan: Blasi et al. [
50] analyzed
Moraiolo,
Leccino, and
Frantoio olive leaves, gathering samples in December, March, June, and September. TPCs were analyzed with the Folin–Ciocalteu method and expressed as mg/g of gallic acid equivalents. The values reported in the study were slightly higher (up to 65 mg/g or 65,000 mg/kg) than those hereby presented, probably due to the different analysis method [
51]; however, the seasonal pattern is similar, with TPC values increasing from winter to summer, and then decreasing in autumn. Hydroxytyrosol values for the three varieties were found to be in a 1000–6000 mg/kg range, mostly depending on the harvesting season and with no significant difference among cultivars. In agreement with the present study, the highest values were reached in the cold season. Reported oleuropein levels for the three cultivars were very variable, ranging from about 2000 to 60,000 mg/kg and presenting different seasonal patterns for the different cultivars. While TPC values (evaluated with the Folin–Ciocalteu method) and oleuropein concentrations for the three cultivars presented by Borghini et al. [
35] were comparable to those of the present study, reported hydroxytyrosol–glucoside levels were much higher, with samples reaching 26,600 mg/kg for
Moraiolo and 24,700 mg/kg for
Leccino. No seasonal pattern was identified by the authors as their work was focused on geographical variations.
3.1. Variation in Phenolic Compound Profiles Across Cultivars, Years, and Seasons
The effect of each factor (i.e., cultivar, year, and season) and their two- and three-way combinations on each variable (i.e., single phenolic compounds and total phenolic content) was studied using 3-way ANOVA.
3.1.1. Total Phenols, Secoiridoids, and Derivatives
Figure 2 shows the evolution of oleuropein and hydroxytyrosol glucoside isomer 1 content over seasons across two years of olive leaves of the four cultivars included in the study, while those of hydroxytyrosol glucoside isomer 2, hydroxytyrosol, ligstroside, total phenolic content, and dehydrooleuropein isomer 1 and 2 are reported in
Figure S1 of the Supplementary Materials. Finally,
Table 4 and
Table S1 (of Supplementary Materials) summarize the significance of the effect of each factor and their combination on secoiridoids and their derivatives as evaluated using 3-way ANOVA.
A significant effect of all factors and their combinations on all variables occurred, with only a few exceptions, as follows: year for hydroxytyrosol glucoside isomer 2 and methoxyoleuropein isomer 1, and cultivar × year for hydroxytyrosol isomer 1, oleuropein, and total phenols. This means that the effect of the year does not depend on the cultivar for such molecules. Concerning the differences due to each single factor, the significant letters in the top-right square of
Figure 2 show that oleuropein and hydroxytyrosol glucoside show statistically significant differences among almost every factor, with only a few exceptions: oleuropein content is not different in
Moraiolo and
Leccino, while hydroxytyrosol glucoside is not different in
Moraiolo and
Frantoio and in summer and autumn. The most interesting behavior is that of oleuropein and hydroxytyrosol glucoside isomer 1, where a recurring trend can be immediately observed across all four cultivars throughout the seasons and years (
Figure 2). Contrarily, in the case of hydroxytyrosol and, especially, hydroxytyrosol glucoside isomer 2, the variations do not consistently follow the same trend observed for hydroxytyrosol glucoside isomer 1, except for certain combinations of cultivar, year, and season (
Figure S1). Finally, the total phenolic content exhibits few significant variations, and a less homogeneous pattern compared to oleuropein, while ligstroside shows a general increase from spring to summer, followed by a decreasing trend in the succeeding seasons.
The results observed for oleuropein and hydroxytyrosol glucoside isomer 1, confirmed across four cultivars and over two growing seasons, suggest that their concentrations undergo regular seasonal variations, likely due to changes in the activity of specific enzymes which are in turn influenced by the plant’s phenological stages (i.e., spring vegetative growth, summer heat and water stress, autumn regrowth, and winter dormancy), but also by other factors, such as genotype, geographical location, stress, and ultraviolet radiation [
31,
38,
52]. Moreover, the seasonal variations in these two compounds (
Figure 2) indicate an inverse proportionality: where the concentration of one increases, the other decreases, and vice versa. Specifically, starting from spring, oleuropein levels increase, reaching their highest values in summer and autumn, before decreasing again in winter to levels comparable to those observed in spring. In contrast, and even more markedly, hydroxytyrosol glucoside isomer 1 shows the opposite pattern: its concentrations peak in winter and spring, then drop by more than an order of magnitude in summer and autumn. It is also noteworthy that oleuropein levels in spring and winter are never significantly different (with the sole exception of cultivar
Leccino in the second year), just as the levels of hydroxytyrosol glucoside isomer 1 in summer and autumn do not show significant differences (except for cultivar
Leccio del Corno in the first year). These data indicate that hydroxytyrosol glucoside isomer 1 can be considered a direct precursor of oleuropein, synthesized by the plant through enzymatic pathways during the warm seasons, likely in response to water deficit and heat stress [
53,
54]. These two molecules can be considered the main phenolic compounds correlated with the seasonal variability in olive leaf of the four cultivars studied in this article.
It is also interesting to note the behavior of the two isomers of dehydrooleuropein. Their variation is shown in
Figure S1 of Supplementary Material, while
Table S1 summarizes the significance of the effect of each factor and their combination as evaluated using 3-way ANOVA, showing a significant effect of all factors and interactions. The graphs in
Figure S1 show a behavior very similar to that of oleuropein, with an increase passing from spring to summer and autumn, followed by a decrease in winter and spring. This similar behavior supports the identification of these molecules as dehydrooleuropein, structurally similar to oleuropein.
3.1.2. Flavonoids
After oleuropein, the class of flavonoids is the most represented in the phenolic profile of leaf of the analyzed cultivars [
55,
56]. The evolution of their content over seasons across two years is reported in
Figure S2 of the Supplementary Material, while
Table 5 summarizes the significance of the effect of each variable and of their combination as evaluated using 3-way ANOVA. Briefly, no recurrent trend can be observed for the 5 luteolin-related flavonoids, and in general the contents of luteolin glucoside isomer and luteolin diglucoside were the lowest in spring (with only a few exceptions), while they were often the highest in that season in the case of luteolin-7-
O-glucoside, luteolin-4-
O-glucoside, and luteolin rutinoside, even if the differences among samples were not particularly relevant. The levels of luteolin-7-
O-glucoside and luteolin rutinoside were the highest and lowest for
Leccio del Corno and
Moraiolo, respectively, while luteolin diglucoside confirmed high levels in
Leccio del Corno but the lowest one in
Frantoio. No recurrent significant trends were observed for rutin and flavonoid
m/
z 505, while flavonoid
m/
z 553 always increased from spring thereafter and decreased from autumn to winter. However, among flavonoids, the most evident differences were observed for apigenin rutinoside, which showed the greatest levels in
Leccino followed by
Leccio del Corno, with lower levels for
Frantoio and
Moraiolo. For all cultivars, this molecule showed very low levels in spring, approximately an order of magnitude lower than in the other seasons.
It was possible to compare with other studies the data hereby presented on flavonoids in
Leccino olive leaf, as this particular variety, which originates from Tuscany, has expanded to southern Italy and to several countries like Croatia and Spain, also due to the cultivar’s resistance to frost and to pathogens like
Xylella fastidiosa [
57,
58]. In agreement with this work, Peskovic et al. [
38] reported concentrations of apigenin glucoside significatively higher than other cultivars and up to 1200 mg/kg, with values very similar to those shown in
Figure S2 of the Supplementary Materials; similarly, Borghini and coworkers [
35] reported for
Leccino apigenin values higher than for other cultivars, with concentrations up to 8000 mg/kg. In both studies the molecule was the second most abundant flavonoid compound in
Leccino leaves after luteolin-7-
O-glucoside. Difonzo and coworkers [
27] reported much lower levels of apigenin glucoside in the cultivar, with values up to 61 mg/kg; by contrast,
Leccino was the variety with the lowest concentration of this molecule among the considered cultivars.
3.2. Chemometric Models for Samples Differentiation
To assess the ability of the analyzed phenolic compounds to classify leaf samples according to either cultivar, season, or harvest year, models were developed using two methods based on different approaches: (i) Genetic Algorithm–Linear Discriminant Analysis (GA-LDA); and (ii) Random Forest (RF) [
41,
42]. The results obtained are summarized in the confusion matrices shown in
Table 6, and the models’ performance was evaluated in terms of accuracy (i.e., the percentage of correctly classified samples relative to the total number of samples) and Class Error Average (CEA, i.e., the average of error percentages within each individual class). Furthermore,
Table 7 summarizes the phenolic compounds selected as the most important variables for their significance in discriminating samples according to each single factor (i.e., cultivar, year, and season) after application of either GA-LDA and RF methods.
The data in
Table 6 confirm the excellent performance of the phenolic compounds in olive leaves in discriminating samples based on cultivar, harvest year, and harvesting season. In particular, with regard to cultivar, the Random Forest method yielded an accuracy of 98.96% and a Class Error Average (CEA) of 1.04%, while the GA-LDA method achieved an accuracy of 94.8% and a CEA of 5.21%. As for the season, both methods (RF and GA-LDA) provided an accuracy of 100%, and consequently, a CEA of 0%. Lastly, in the case of harvest year, the Random Forest method yielded an accuracy of 97.92% and a CEA of 2.08%, whereas the GA-LDA method achieved an accuracy of 98.96% and a mean class error of 1.04%. It is also noteworthy that in no case did the model misclassify more than one sample per class, with the sole exception of the
Leccino cultivar, for which 4 out of 24 samples were misclassified by the GA-LDA model, still achieving good performance (83.3% accuracy). These performances, confirmed at very high percentages using two methods based on very different approaches, reinforce the conclusion that the phenolic profiles of olive leaves vary significantly according to the factors considered in this study: cultivar, season, and harvest year. This evidence is even more significant considering that every olive leaf sample was collected in the same area and on the same day for each sampling point, thereby eliminating variability related to geographic origin and the differences related to pedoclimatic conditions, which may have an impact if samples are collected on different days.
Finally,
Table 7 presents the most important variables in discriminating among the various sample categories, considering the cultivar, the harvesting season, and the harvest year for both methods used. In the case of the GA-LDA method, the total content of phenolic compounds emerged as one of the most significant variables for all three factors (cultivar, year, and harvesting season). Additionally, hydroxytyrosol glucoside isomer 1 (for harvesting season and year), luteolin rutinoside (for cultivar and harvesting season), and oleuroside (for cultivar and year) were found to be significant for two factors. Considering the individual factors, the variables that were 100% significant were luteolin rutinoside and luteolin diglucoside for the cultivar, hydroxytyrosol glucoside isomer 1 for the harvesting season, and methoxyoleuropein isomer 2, oleuroside, and phenolic acid derivative
m/
z 379 for the harvest year. In the case of the RF method, no variable was found to be among the most significant across all three factors, while apigenin rutinoside was significant for two factors (cultivar and harvesting season). Considering the individual factors, the variables that were 100% significant were luteolin-7-
O-glucoside for the cultivar, apigenin rutinoside for the harvesting season, and the flavonoid with
m/
z 505 for the harvest year.
4. Conclusions
This article presents HPLC-DAD-MS data on phenolic compounds in olive leaf samples of four major Olea europaea L. Tuscan cultivars. The employed sampling approach (all samples collected on the same day from the same orchard) allowed for effectively pointing out the influence of cultivar, harvesting season, and harvesting year on the concentrations of the analyzed phenolic and secoiridoid derivatives. Oleuropein and total phenolic content reached the highest levels in autumn for the Leccio del Corno variety and showed a recurring seasonal pattern in all varieties, with higher concentrations in summer and autumn and lower in spring and winter, while hydroxytyrosol glucoside exhibited an opposite seasonal pattern. Though no specific pattern could be identified for the nine quantified flavonoids, 3-way ANOVA highlighted the significance of these compounds in differentiating samples, especially in terms of cultivar and harvesting season. Application of GA-LDA and RF confirmed the possibility to discriminate olive leaves samples in terms of cultivar, season, and year of harvest through their phenolic profiles, and allowed the identification of the most significant parameters that differentiate the samples. The data herein reported will be useful for valorization of olive by-products, by allowing the selection of the most suitable cultivars for specific goals, such as producing olive leaf extracts rich in phenolic derivatives, such as oleuropein or hydroxytyrosol. Future research should focus on characterizing leaf samples of other cultivars for expanding knowledge and the spectrum of cultivars with known concentration of phenols.