1. Introduction
Olive (
Olea europaea L.) fruits undergo profound biochemical and physiological changes during ripening, directly influencing both the technological and nutraceutical quality of fruits and the resulting oil. Among the most relevant biochemical constituents, chlorophylls and polyphenols play a key role in determining oil color, oxidative stability, and health-promoting properties [
1]. Chlorophylls, mainly chlorophyll
a and
b, prevail in unripe drupes, conferring the characteristic green color and sustaining photosynthetic activity. As ripening progresses, chlorophyll degradation accompanies the conversion of chloroplasts into chromoplasts, while carotenoids and anthocyanins accumulate, driving the color shift from green to purple or black [
2,
3]. The two chlorophylls typically degrade in parallel, maintaining an approximately constant a/b ratio (~3:1), a parameter that reflects the physiological status of the photosynthetic apparatus [
4]. Chlorophylls also influence the oxidative stability of olive oil [
5]. Polyphenols, in turn, are a heterogeneous class of secondary metabolites responsible for the antioxidant activity, sensory properties, and stability of olive oil. Their biological importance is well recognized by the European Union, which authorizes health claims linking olive oil polyphenols to the protection of blood lipids from oxidative stress [
6]. Beyond their nutritional value, polyphenols act as primary antioxidants during storage, delaying oxidation processes, and preserving oil freshness. Their concentration strongly affects sensory attributes, shelf life, and the nutritional value of olive oil, and its evolution throughout drupe maturation has been widely investigated [
7]. Indeed, polyphenol concentration is typically high in early fruit development and decline markedly during ripening, with the extent of reduction varying among cultivars and depending on environmental factors [
8]. The quantification of both metabolites is conventionally performed through destructive analytical assays based on extracts obtained from fruit or oil samples. Conventional destructive assays require 70–180 min per sample (HPLC-DAD: ~80 min; Folin–Ciocalteu: ~180 min including 2-h incubation), consuming 10–20 fruits per cultivar-stage.
To determine optimal harvest timing that balances yield and quality, several ripening indices have been proposed, often relying on simple, easily measurable parameters such as skin color [
9]. However, these indices are affected by cultivar-specific characteristics and environmental variability, limiting their general applicability. In contrast, spectroscopic techniques provide a non-destructive, rapid, and sensitive alternative to traditional chemical analyses. Hyperspectral imaging enables simultaneous monitoring of multiple parameters and repeated analysis of the same samples, allowing the development of predictive models for identifying different cultivars, ripening stage or key fruit traits such as moisture, dry matter, oil content, and total phenolics [
10,
11,
12,
13]. Hyperspectral measurements acquire spectra in <10 s, enabling high-throughput phenotyping and longitudinal tracking of identical fruits across ripening progression.
Despite extensive biochemical characterization of olive fruit maturation, several critical knowledge gaps remain, and most spectroscopic studies have focused on leaves or oils, with limited research addressing in situ quantification of secondary metabolites in intact fruits during ripening [
14].
Vegetation indices have been widely applied in agricultural remote sensing for crop monitoring and quality assessment [
15,
16], but their systematic evaluation for predicting biochemical dynamics in olive fruits remains largely unexplored [
17,
18]. This knowledge gap reflects several limitations in previous olive fruit spectroscopy studies: narrow genetic scope (typically 1–5 cultivars) [
14,
17], snapshot sampling at harvest rather than complete developmental trajectories, and evaluation of 5–18 pre-selected indices without systematic screening of alternative formulations [
10]. The majority of available VIs—originally developed for canopy-level vegetation monitoring, stress detection, or leaf pigment estimation—have not been tested for their suitability in tracking ripening-associated biochemical changes in olive fruits, leaving uncertainty about which spectral formulations and wavelength combinations are optimal for this specific application [
19]. Furthermore, while cultivar-specific variation in chlorophylls and polyphenol dynamics has been documented, the relative contribution of conserved physiological mechanisms versus genotype-dependent regulation remains poorly understood, limiting the development of robust, germplasm-wide predictive models. Additionally, conventional ripening indices based on visual parameters are affected by cultivar-specific characteristics and environmental variability, reducing their reliability for harvest optimization across diverse production systems.
The present study represents the first phase of a systematic research program for vegetation index performance evaluation in olive fruit biochemical phenotyping. This exploratory screening prioritizes genetic diversity over within-cultivar replication to establish species-level relationships between spectral signatures and biochemical dynamics across diverse germplasms. A library of 87 vegetation indices was systematically assessed against biochemical data collected from thirty-one cultivars at four ripening stages, representing the most extensive germplasm-wide VI performance comparison conducted to date for olive fruit quality assessment. This foundational phase pursued three primary objectives: (i) identify vegetation indices exhibiting consistent correlations with chlorophylls and polyphenol content across diverse genotypes, determining optimal spectral formulations and wavelength combinations that capture species-level biochemical patterns; (ii) quantify inter-cultivar variability in VI–chemical relationships to distinguish conserved physiological processes from genotype-dependent metabolic strategies, thereby assessing the feasibility of universal versus cultivar-specific calibration approaches; and (iii) characterize the range and directionality of cultivar-level patterns in chlorophyll degradation and polyphenol accumulation dynamics across the germplasm panel.
These germplasm-wide results will guide subsequent validation studies with higher within-cultivar replication. Such studies will develop cultivar-specific predictive models for operational deployment in commercial production. By establishing general VI–chemical relationships and quantifying cultivar-specific variability boundaries, this work provides the foundation for targeted selection of cultivars and indices for intensive calibration efforts aimed at optimizing harvest timing in commercial production environments.
2. Materials and Methods
2.1. Plant Material and Olive Sampling
The study was conducted at the National Research Council’s “Santa Paolina” Experimental Farm (Follonica, Central Italy; 42°56′39″N, 10°46′16″E, 38 m a.s.l.), which maintains a reference germplasm collection of approximately 1600 trees representing 916 olive accessions. Thirty-one cultivars (
Table 1) were selected to maximize genetic and phenotypic diversity across 12 geographic origins (Spain, Turkey, Croatia, France, 8 Italian regions). The panel includes major commercial cultivars and minor local varieties, capturing the full spectrum of Mediterranean olive biochemical and morphological diversity.
Plants were grown within the same orchard block under uniform agronomic conditions, including consistent planting density, rainfed cultivation, synchronized pruning, and homogeneous soil characteristics. This single-site approach deliberately eliminates genotype × environment confounding, enabling unambiguous attribution of spectral–chemical variation to genetic effects across cultivar diversity—the primary objective of germplasm screening studies.
Fruit sampling was performed between early October and late November 2024, collecting drupes from different canopy positions of each tree and across successive ripening stages. Four maturity levels were identified on the basis of visual evaluation of peel coloration (M1: 100% green; M2: small reddish spots; M3: turning color; M4: 100% purple or black), following the classification reported by Alamprese et al. [
20]. For each ripening stage, ten uniform and defect-free drupes were selected for spectral acquisition. Reflectance measurements were acquired at two opposing positions on each fruit (apical and stylar sides) to account for spatial variation in surface coloration, yielding a total of 2480 spectra (31 cultivars × 4 ripening stages × 10 fruits × 2 positions). To minimize post-harvest biochemical changes, all spectral measurements were conducted immediately after fruit detachment (<2 h from harvest to measurement).
For each cultivar, spectral data were averaged wavelength-by-wavelength across all replicates (10 fruits × 2 positions) to obtain representative reflectance profiles (124).
2.2. Hyperspectral Measurement Processing
Spectral measurements were conducted in a controlled laboratory environment to minimize external light, atmospheric, and temperature variability. Spectral reflectance data were collected using an HR2 spectrometer (Ocean Optics, Orlando, FL, USA) covering the 380–1080 nm range with a spectral resolution of 0.46 nm. The setup included a 45° diffuse reflectance probe (DR-Probe, Ocean Optics) integrated with a tungsten–halogen light source, connected via a 6 μm core optical fiber. A 40 mm stand-off spacer ensured a constant measurement distance and minimized ambient light interference. Custom-designed masks were used to match the size of the olives and to confine the measurement area exclusively to the fruit surface. Each mask was coated with ultra-matte black acrylic paint exhibiting up to 98% visible light absorption to prevent reflectance contamination. Instrumental drift was controlled by applying a dark current correction after each cultivar set. Additionally, a white reference spectrum from a certified reflective standard was recorded every five samples to normalize the data and express the radiance as percent reflectance, compensating for light source and sensor variability.
2.3. Chlorophyll a and b Determination by HPLC-DAD
Fruit chlorophyll
a and
b contents were extracted following Arar [
21] with some modifications. Briefly, 0.5 g of homogenized fresh fruit was extracted with 20 mL acetone/water (80:20,
v/
v) and sonicated for 20 min. Then, the extract was centrifuged at 2500 rpm for 10 min at 4° C in centrifuge, and the supernatant was recovered. Then an aliquot of 1 mL of the supernatant was filtered through a 0.22 µm polytetrafluoroethylene (PTFE) syringe filter (Millipore Ltd., Bedford, MA, USA) and analyzed by HPLC. Analysis of chlorophylls was performed using a high-performance liquid chromatography system composed of a Shimadzu LC-30AD series chromatographic system (Shimadzu, Kyoto, Japan) with a LC-20 AT binary pump, an SPD-20A UV–vis detector and a CTO-20A column oven. Data processing was performed with LC Solution software (version 5.89 - Shimadzu Corporation, Kyoto, Japan). Chromatographic separation was carried out on a C18 column (Zorbax
® Eclipse Plus, 150 × 4.6 mm, 5 µm particle size) maintained at 30 °C. The injection volume was 50 µL. The elution gradient for separation was followed as established by [
22] with slight modification with the following mobile phases: A, water/1 M ammonium acetate in water/methanol (1:1:8,
v/
v/
v) and B, methanol/acetone (1:1,
v/
v). The mobile phase was delivered at a flow rate of 1.0 mL min
−1 and the elution was carried out using a linear gradient as follows: 0.0–6.0 min, A decreased from 95% to 50%; 6.0–12.0 min, A decreased from 50% to 0%; 12.0–16.0 min, 0% A and 100% B; 16.0–20.0 min, A increased from 0% to 95%; 20.0–25.0 min, the system was held at 95% A and 5% B for column re-equilibration. PDA spectra were acquired over 350–800 nm; chromatograms were extracted at 432 nm and 460 nm for quantification of chlorophyll
a and chlorophyll
b, respectively, Calibration curves were generated for pigment quantification by plotting injected amount against integrated peak area. Calibration functions were obtained by linear least-squares regression over a concentration interval selected to cover the levels measured in the samples. For each chlorophyll reference standard, the limits of detection (LOD) and quantification (LOQ) were also established.
2.4. Total Polyphenols Determination by UV–VIS Spectrophotometry
Total polyphenols of olive drupe were quantified using a solid–liquid extraction followed by colorimetric determination with a modified Folin–Ciocalteu method [
23]. Approximately 1 g of homogenized sample was weighed and mixed with 10 mL of extraction solvent (acetone/water 80:20,
v/
v, containing 0.1% HCl). Then, the mixture was sonicated for 15 min and subsequently shaken for 30 min. Samples were, then, centrifuged at 3000 rpm for 5 min, and the supernatant was recovered and transferred into a 25 mL volumetric flask, brought to volume with ultrapure water, and filtered through a 0.45 µm membrane filter. An aliquot of 5 mL of the extract was mixed with 25 mL of ultrapure water, 2.5 mL of Folin–Ciocalteu reagent, and 10 mL of 20% (
w/
v) sodium carbonate solution. The mixture was then brought to 50 mL volume with ultrapure water in a volumetric flask. A reagent blank was prepared by mixing 2.5 mL of Folin–Ciocalteu reagent and 10 mL of 20% (
w/
v) sodium carbonate solution and ultrapure water. All flasks were kept in the dark for 2 h, after which absorbance was measured at 750 nm using a UV–VIS spectrophotometer. Total polyphenol content was expressed as gallic acid equivalents (GAE).
2.5. Vegetation Index Computation
A comprehensive set of 87 vegetation indices (VIs) was selected through systematic literature review, focusing specifically on indices relevant to fruit maturation physiology (
Table S1). The selection strategy prioritized indices sensitive to key biochemical processes during ripening, including chlorophyll degradation, carotenoid dynamics, and structural modifications. To ensure comprehensive coverage of potential spectral responses, multiple indices targeting similar biochemical processes were deliberately included, enabling identification of the most effective formulations for assessing olive maturation. To optimize computational efficiency, only wavelengths specifically required for VI calculation were extracted from the full hyperspectral dataset. A comprehensive wavelength requirement list was compiled by systematically parsing the mathematical formulations of all 87 indices documented in
Table S1: each formula specifies required spectral bands (e.g., NDVI = (R800 − R670)/(R800 + R670) requires 800 and 670 nm), and compiling all requirements yielded 54 unique wavelengths. A custom Visual Basic macro was developed to extract reflectance values at these 54 wavelengths from the complete spectral database (380–1080 nm range, ~1500 spectral bands), creating a reduced matrix (
n samples × 54 wavelengths) for subsequent VI calculation. All 87 vegetation indices were then computed from this reduced dataset according to their respective mathematical formulations and evaluated as candidate predictors for the biochemical target variables.
2.6. Data Structure and Preprocessing
The analysis was conducted on a dataset of 124 observations, corresponding to 31 olive cultivars evaluated at four distinct ripening stages (M1, M2, M3, M4), with each cultivar-stage combination representing an individual data point. Target variables included chlorophyll a content (Chl_A), chlorophyll b content (Chl_B), total chlorophyll content (Chl_A+B), and total polyphenol content (Polyphenol).
2.7. Selection of Vegetation Indices Correlated with Chemical Parameters
A multi-step statistical framework was developed to identify VIs most strongly and consistently correlated with chemical parameters (chlorophyll a, chlorophyll b, total chlorophyll, and polyphenols) across olive cultivars at different ripening stages (M1–M4).
For each VI, correlation coefficients were computed between the VI values and each chemical parameter across all cultivars. A correlation was classified as “strong” if its absolute value exceeded a predefined threshold. Pearson correlation coefficients were computed between VI values and each chemical parameter across all cultivars. Correlations were classified as “strong” if |r| ≥ τ, where τ = 0.9. For each vegetation index j, a composite Total Score was calculated as:
where i indexes the 124 cultivar-parameter combinations (31 cultivars × 4 parameters: Chl_A, Chl_B, Chl_A+B, Polyphenols), r_ij is the correlation coefficient, and 𝟙(·) is an indicator function. Higher Total Scores indicate consistent VI performance across diverse germplasm.
Consistency across cultivars was quantified via coefficient of variation:
where σ(|r|) and μ(|r|) are the standard deviation and mean of absolute correlations across cultivar-specific analyses for VI j.
Statistical significance was assessed via permutation testing (1000 iterations). In each iteration, chemical values were randomly shuffled, correlations recomputed, and the maximum Total Score across all 87 VIs recorded. p-values represent the proportion of permuted maxima ≥ observed Total Score_j. Significance levels: *** (p < 0.001), ** (p < 0.01), * (p < 0.05), ns (p ≥ 0.05).
Final VI ranking integrated three criteria:
where
prioritizes low inter-cultivar variability, and
prioritizes statistical robustness.
VIs were ranked by descending Weighted Score.
Elbow Method was employed to determine the optimal subset size of VIs most strongly and consistently correlated with chemical parameters, thereby reducing noise from weakly correlated indices in subsequent multivariate analysis. It identifies the point of diminishing returns where adding more VIs yields minimal improvement in correlation strength. Three algorithms were applied to detect the inflection point in the Weighted Score curve: (i) maximum curvature (identification of maximum absolute value in the second derivative of normalized scores); (ii) Kneedle Algorithm (maximum perpendicular distance from the score curve to the line connecting first and last points); and (iii) Threshold Method (First position where Weighted Score fell below 70% of maximum). The median of these three estimates provided a consensus elbow position. Compositional stability was assessed using Jaccard similarity coefficients between top-N VIs and incrementally larger sets (top-N + k, k = 1 ... 5). The optimal N was identified where the rate of change in mean Jaccard similarity fell below 1%, indicating that additional VIs no longer substantially altered subset composition. The final optimal number was determined as the median of values from both methods.
2.8. Principal Component Analysis and Cultivar Clustering
VI values and chemical parameters were averaged across the four ripening stages (M1–M4) for each cultivar, yielding cultivar-level mean profiles. Principal Component Analysis (PCA) was performed on the matrix of cultivar-level VI means (n cultivars × m selected VIs) using standardized variables (mean = 0, variance = 1). The first two principal components (PC1 and PC2) were extracted, and their contribution to total variance was reported. Variable contributions (loadings) and quality of representation (cos2) were computed for each VI to identify the most influential indices. To validate the PCA-based grouping, k-means clustering was performed directly on the standardized chemical parameter matrix. The optimal number of clusters (k) was determined using the elbow method (within-cluster sum of squares) and silhouette analysis. Cluster assignments were compared with the PCA-based grouping to assess consistency between VI-predicted and chemistry-based cultivar classifications.
ANOVA was performed to test the effect of ripening stage on each chemical parameter across all cultivars. Normality of residuals was assessed using the Shapiro–Wilk test, and homoscedasticity was evaluated using Levene’s test. When parametric assumptions were violated (p < 0.05), the non-parametric Kruskal–Wallis test was applied. Post hoc multiple comparisons were conducted using Tukey’s HSD test for parametric analyses or Dunn’s test with Bonferroni correction for non-parametric analyses.
All the above-mentioned analyses were performed using R (version 4.4.2) using the following packages: readxl for data import, dplyr for data manipulation, ggplot2 and ggrepel for visualization, car and agricolae for ANOVA, dunn.test for non-parametric cluster post hoc tests, factoextra and NbClust for PCA and clustering, and corrplot for correlation matrix visualization. Custom functions were developed for robustness analysis and optimal VI selection. Statistical significance was set at α = 0.05 unless otherwise specified.
3. Results
3.1. Selection of Vegetation Indices
The analysis evaluated 87 vegetation indices across 31 olive cultivars at four ripening stages (M1–M4) against four chemical parameters (chlorophyll a, chlorophyll b, total chlorophyll, and polyphenols). The objective determination of optimal VI subset size revealed substantial agreement. The Elbow Method, which identifies the inflection point where incremental performance gains diminish, was decomposed into three complementary algorithms: (i) curvature detection via second-derivative analysis, (ii) the Kneedle algorithm measuring perpendicular distance from the score-rank linear trend, and (iii) a 70%-threshold criterion identifying the point where weighted scores fell below 70% of the maximum value. These three independent estimates converged on a median of n = 11 (curvature: 10, Kneedle: 12, threshold: 11), indicating robust detection of the performance plateau.
This multi-method convergence provides objective validation that 11 vegetation indices (representing 87.4% reduction from the original 87-index set—
Table 1 and
Table S1) capture the core predictive information while maintaining a Total Score of 71 (78% of maximum).
The top-ranked vegetation indices accounted a Total Scores ranging from 71 to 91 strong correlations (|r| ≥ 0.9) across cultivar-parameter combinations. ‘MCARI 3’ and ‘TCARI’ achieved 91 instances (over a total of 124) of strong correlation across 31 cultivars and 4 chemical parameters, representing consistent predictive capacity across diverse germplasm.
However, correlation stability metrics revealed moderate variability across cultivars, with coefficient of variation values ranging from 19.7% to 21.3% for the first eleven indices. This inter-cultivar variability resulted in Weighted Scores (42.2–72.8) substantially much lower than raw Total Scores, indicating that while these VIs exhibit strong correlations in individual cultivars, the magnitude and direction of correlations show moderate inconsistency across the population. This finding suggests cultivar-specific calibration may be necessary for operational deployment of VI-based chemical estimation models.
A detailed analysis of spectral characteristics and mathematical formulations underlying these performance rankings is provided in
Appendix B, which identifies wavelength selection (particularly 550 nm inclusion) and mathematical complexity as primary determinants of VI effectiveness for olive fruit biochemical assessment.
3.2. Principal Component Analysis and Chemical Parameter Relationships
The Principal Component Analysis (PCA) performed on cultivar-level aggregated data (31 cultivars × top-ranked 11 VIs) successfully captured 100% of total variance in the chemical dataset using only two principal components (PC1: 60.8%, PC2: 39.2%), demonstrating the high dimensionality reduction efficiency and the presence of strong underlying chemical patterns among olive cultivars (
Figure 1, left panel).
To characterize VI–chemical relationships, correlation analysis between VI values and the chemical-derived principal component space revealed moderate associations (
Figure 1, right panel). TCARI, CARI2, and MCARI3 exhibited the strongest correlations with PC2 (polyphenol axis; r = 0.23–0.33), while TCI and simple ratio indices showed modest correlations with PC1 (chlorophyll axis; r = 0.13–0.33). These moderate correlations reflect the fundamental design of the selected VIs: optimization for temporal ripening dynamics within cultivars (stage-by-stage |r| ≥ 0.9,
Table 2 and
Table S1) rather than baseline chemical discrimination across cultivars. The VI–chemical correlation patterns align with known spectral sensitivities: TCARI and MCARI3 incorporate weighted spectral differences spanning 550–700 nm that capture both chlorophyll and polyphenol-associated absorbance features, while TCI emphasizes the triangular relationship between green reflectance, red absorption, and red-edge position particularly responsive to chlorophyll content.
K-means clustering applied directly to cultivar-level chemical parameters identified three clusters. Cluster validation confirmed that k = 3 is supported by elbow method (42.5% clustering variance reduction from k = 2) and silhouette analysis (score = 0.493). Most critically, external verification demonstrated that clusters differ significantly in measured chemical composition: total chlorophyll (Kruskal–Wallis χ2 = 6.99, p = 0.0055) and total polyphenols (χ2 = 12.29, p = 0.0023), confirming that the classification reflects genuine biochemical differentiation rather than arbitrary partition of continuous variation.
The first cluster encompasses 5 cultivars positioned in the upper region of positive PC2 values, including Farga, Leccio del Corno, Roggianella, Salegna, and Sargano. This cluster exhibits a compact spatial distribution with cultivars extending toward positive PC1 values and PC2 values between +1.0 and +2.0, indicating distinctive chemical signatures characterized by exceptionally high polyphenol content. These cultivars exhibited moderate total chlorophyll and remarkably high polyphenol concentrations throughout the ripening process, with total chlorophyll and polyphenols levels averaging 91.8 and 7385.6 mg kg−1, respectively, representing an 84% elevation above the population mean for polyphenols.
A second group contains 22 cultivars distributed across the central and left regions of the plot, containing varieties such as Bella di Spagna, Bianchera, Carboncella, Leccino, and Oliva Rossa. Key characteristics included a broader spatial dispersion along both PC1 (−2.5 to +1.0) and PC2 (−1.5 to +1.0) axes, indicating substantial within-cluster chemical heterogeneity. This group was characterized by low-to-moderate chlorophyll levels, with total chlorophyll and polyphenols levels averaging 126.6 and 4016.8 mg kg−1, respectively. This larger group represents what could be termed the “typical” chlorophyll and polyphenol range for most olive cultivars in the study. The third group contains 4 cultivars (Coratina, Intosso, Maurino, Moraiolo) distributed across the right region of the plot with positive PC1 values, exhibiting a broader spatial dispersion along the PC2 axis (−2.5 to +0.5). This cluster was characterized by exceptionally high chlorophyll concentrations, with total chlorophyll and polyphenols levels averaging 790.4 and 5799.8 mg kg−1, respectively, representing a 6.2-fold elevation in chlorophyll content compared to the typical-range group (Cluster 2). The sustained high chlorophyll levels in these cultivars reflect genotype-specific regulation of chlorophyll catabolism during fruit ripening.
Critically, correlation analysis between PC scores and chemical parameters revealed uniformly weak relationships: PC1 correlated with chlorophyll a (r = −0.007), chlorophyll b (r = 0), total chlorophyll (r = −0.004), and polyphenols (r = 0.21); PC2 showed marginally stronger but still weak correlations with chlorophylls (r = 0.24–0.25) and polyphenols (r = 0.17). All correlations remained below the conventional threshold for practical predictability (|r| < 0.3). This discrepancy between VI-derived principal components and measured chemical parameters presents a significant interpretive challenge. While the selected VIs demonstrated strong within-cultivar correlations during ripening progression (|r| ≥ 0.9 at the stage-by-stage level), aggregation to cultivar-level means appears to collapse this variation, resulting in PC axes that capture inter-cultivar spectral differences orthogonal to the measured chemistry.
3.3. Chlorophyll Content
All the cultivars exhibited strong negative correlations between ripening stage and chlorophyll content (
Table 2), consistent with chlorophyll degradation during fruit maturation. Cultivars such as Coratina (Ch_B: r = −0.999,
p < 0.001), Bella di Spagna (Ch_A: r = −0.998,
p < 0.01; Ch_A+B: r = −0.997,
p < 0.01), Dolce d’Andria (Ch_B: −0.991,
p < 0.01), XVII-87 (Ch_B: r = −0.996,
p < 0.01), and Leccio del Corno (Ch_B: r = −0.993,
p < 0.01), and Raccioppella (Ch_A: r = −0.997,
p < 0.01) demonstrated particularly robust and statistically significant relationships. Conversely, Oblonga showed weaker negative correlations (r = −0.696 to −0.726, ns), suggesting more gradual or variable chlorophyll loss. The detection of statistically significant correlations with limited per-cultivar sample size (
n = 4) indicates that observed patterns exceed detection thresholds given low statistical power. However, non-significant correlations (even with |r| > 0.80) should not be dismissed, as they may represent genuine biological trends undetectable at
n = 4 due to insufficient degrees of freedom.
The cultivar-specific negative correlations between ripening stage and chlorophyll content are consistent with population-level statistical trends. Non-parametric Kruskal–Wallis tests demonstrated highly significant global effects of ripening stage on all chlorophyll fractions (Ch_A: χ
2 = 22.18,
p < 0.001; Ch_B: χ
2 = 20.95,
p < 0.001; Ch_A+B: χ
2 = 22.60,
p < 0.001). Mean total chlorophyll content decreased progressively across ripening stages: M1 (297.6 ± 334.3 mg kg
−1,
n = 31), M2 (224.5 ±283.0 mg kg
−1,
n = 31), M3 (169.8 ± 238.5 mg kg
−1,
n = 31), and M4 (134.8 ± 215.7 mg kg
−1,
n = 31), representing a 54.7% reduction from M1 to M4. Chlorophyll degradation was particularly pronounced between early (M1) and advanced maturation stages (M3–M4), with post hoc Dunn’s tests revealing significant pairwise differences (M1 > M3,
p < 0.01; M1 > M4,
p < 0.001; M2 > M4,
p < 0.05) (
Table A1,
Appendix A). The concordance between individual cultivar patterns (31 of 31 cultivars exhibited negative relationships with mean |r| = 0.93) and the significant population-level effect indicates that chlorophyll degradation is a conserved biochemical process across olive genotypes, with quantitative rather than qualitative variation among cultivars.
3.4. Total Polyphenols Content
Polyphenol dynamics during ripening revealed contrasting patterns among cultivars (
Table 3), reflecting diverse biosynthetic strategies. Approximately half of the cultivars exhibited significant positive correlations with ripening stage, including Bianchera (r = 0.995,
p < 0.01), Leccio del Corno (r = 0.991,
p < 0.01), Intosso (r = 0.986,
p < 0.05), Piangente (r = 0.986,
p < 0.05), and Dolce d’Andria (r = 0.982,
p < 0.05), indicating progressive polyphenol accumulation throughout maturation. In contrast, other cultivars displayed strong negative correlations, with Rossellino (r = −0.995,
p < 0.01), Oliva Rossa (r = −0.988,
p < 0.05), Raza (r = −0.985,
p < 0.05), Morchiaio (r = −0.981,
p < 0.05), and Farga (r = −0.963,
p < 0.05) showing significant polyphenol degradation during ripening. Notably, cultivars such as Marzio (r = 0.02, ns) and Raccioppella (r = 0.054, ns) showed minimal variation, suggesting cultivar-specific regulation of phenolic metabolism during fruit development. The contrasting directional patterns observed among individual cultivars—with some exhibiting positive correlations and others negative correlations between ripening and polyphenol content—effectively cancel each other when data are pooled across the entire population. Consequently, neither ANOVA nor Kruskal–Wallis tests detected significant global effects of maturation on polyphenol content (χ
2 = 0.25,
p = 0.969), with population-level means remaining remarkably stable across ripening stages: M1 (4753.5 ± 1982.0 mg kg
−1 GAE,
n = 31), M2 (4754.8 ± 1909.2 mg kg
−1 GAE,
n = 31), M3 (4817.2 ± 1760.0 mg kg
−1 GAE,
n = 31), and M4 (4835.3 ± 1616.0 mg kg
−1 GAE,
n = 31), representing only a 1.7% increase from M1 to M4. This apparent population-level stability masks biologically meaningful cultivar-specific variation (detailed statistical analysis in
Table A1,
Appendix A). This absence of species-level effects despite robust cultivar-specific trends underscores the necessity of cultivar-level phenotyping in agricultural research, as aggregated analyses can obscure the metabolic diversity inherent to different genotypes and lead to erroneous conclusions about the absence of ripening-related biochemical changes.
4. Discussion
The results of the present research clearly demonstrate that vegetation indices can effectively track biochemical changes during olive fruit ripening, with the top-ranked indices achieving strong correlations with chlorophyll and polyphenol content across multiple cultivars.
The Modified Chlorophyll Absorption Ratio Index and Transformed Chlorophyll Absorption demonstrated particularly robust performance, accumulating 91 strong correlations across 124 cultivar-stage combinations. These findings align with previous research showing that chlorophyll-sensitive indices, particularly those utilizing wavelengths in the 550–780 nm range, provide accurate estimation of photosynthetic pigment content in agricultural applications [
24,
25]. The effectiveness of these indices in capturing chlorophyll degradation dynamics is consistent with the well-documented color transition in olive fruits, where progressive chlorophyll loss during ripening is accompanied by anthocyanin accumulation [
26]. The strong negative correlations observed between maturation stage and chlorophyll fractions (χ
2 = 21.85–22.44,
p < 0.001) confirm that VIs successfully capture this fundamental physiological process across diverse genetic backgrounds.
Quantitative comparison with classic VI studies is constrained by methodological differences. Gitelson et al. [
27] reported R
2 = 0.95–0.98 for simple ratio indices (R_NIR/R_550) in maple and chestnut leaves under controlled laboratory conditions. Our top-performing indices (MCARI3, TCARI) achieved 91 strong correlations (|r| ≥ 0.9) across 124 cultivar-stage combinations, representing comparable predictive strength despite greater biological complexity: 31 genetically diverse cultivars with substantial morphological heterogeneity measured under field conditions. Merzlyak et al. [
28] achieved R
2 = 0.88–0.93 for chlorophyll estimation in apple fruits using reflectance ratios, performance levels consistent with our cultivar-averaged results. The moderate inter-cultivar variability (CV = 19.7–21.3%) observed here likely reflects morphological factors (epicarp properties, anthocyanin interference) absent in leaf-based studies, explaining the slightly reduced consistency relative to single-species controlled experiments.
A critical finding of this study is the moderate inter-cultivar variability in VI–chemical relationships, as evidenced by coefficient of variation values ranging from 19.7% to 21.3% for the top-performing indices. This variability resulted in substantial reductions in Weighted Scores compared to raw Total Scores, indicating that while VIs exhibit strong correlations within individual cultivars during ripening progression, the magnitude and direction of these relationships show moderate inconsistency across the germplasm panel. This observed inter-cultivar variability in spectral–chemical relationships is consistent with cultivar-specific physiological processes during fruit maturation. Previous research has demonstrated that the relative rates of chlorophyll and carotenoid degradation differ markedly among olive varieties [
26], suggesting that pigment catabolism occurs at rates inherent to each genotype.
The divergent polyphenol accumulation patterns observed among cultivars represent one of the most significant findings of this study. Approximately half of the cultivars in our germplasm panel exhibited significant positive correlations between ripening stage and polyphenol content, while others showed strong negative correlations, with a smaller subset displaying minimal variation throughout maturation. This marked inter-cultivar heterogeneity reflects the complex, genotype-dependent nature of phenolic compound dynamics during olive fruit ripening. Previous research has documented similarly contrasting patterns: while some studies report increases in total phenols with ripening progression [
29,
30], others have observed declining trends [
31,
32,
33] or non-linear patterns characterized by initial decreases followed by subsequent increases in advanced maturity stages [
33]. These findings underscore that polyphenol accumulation is governed by cultivar-specific regulation of phenylpropanoid pathway genes and differential enzymatic activities, resulting in genotype-specific biochemical signatures that fundamentally impact the universality of spectral–chemical relationships across diverse germplasm.
The absence of significant population-level effects (Kruskal–Wallis χ2 = 0.25, p = 0.969) despite robust cultivar-specific trends reveals a critical limitation of aggregated analyses in germplasm studies. The contrasting directional patterns observed among individual cultivars—with some exhibiting positive correlations and others negative correlations between ripening and polyphenol content—effectively cancel each other when data are pooled across the entire population, masking biologically meaningful variation. This phenomenon underscores the necessity of cultivar-level phenotyping in agricultural research, as population-level analyses alone can obscure the metabolic diversity inherent to different genotypes and lead to erroneous conclusions about the absence of ripening-related biochemical changes.
A critical limitation emerged from the aggregation of temporal data to cultivar-level means. While individual vegetation indices demonstrated strong correlations with chemical parameters during ripening progression within each cultivar, the principal components derived from cultivar-averaged VI data showed uniformly weak correlations with measured chemical parameters. This decoupling indicates that the selected VIs effectively capture ripening dynamics within individual cultivars but are less effective for discriminating baseline chemical composition across cultivars when temporal variation is removed through averaging. This phenomenon represents a well-documented challenge in remote sensing applications, where relationships observed at fine temporal scales often fail to translate to coarser aggregation levels. The temporal averaging process obscures the stage-specific VI–chemical relationships that drive the strong within-cultivar correlations, particularly when these relationships exhibit non-linear or cultivar-dependent trajectories.
Biochemical parameters can be classified by their spectral properties. Optically active compounds—including chlorophylls (absorption at 430, 460, 640, 660 nm), carotenoids (400–500 nm), and anthocyanins (520–560 nm)—directly interact with light through electronic transitions. Non-optically active compounds, such as sugars, organic acids, and phenolic compounds with UV-range absorption (<380 nm), lack direct VIS-NIR spectral signatures and are detectable only indirectly: either through correlation with co-varying optically active markers during physiological processes, or via structural effects on tissue reflectance properties. Recent olive fruit spectroscopy studies corroborate these multifactorial interferences, with NIR prediction models for chlorophyll content showing systematic deviations attributable to concurrent pigment accumulation and morphological heterogeneity [
14,
17]. Consequently, effective cultivar-level chemical discrimination may require either preservation of temporal dynamics through trajectory-based statistical frameworks or expansion of analytical frameworks to encompass targeted biochemical and morphological profiling—strategies that are elaborated in the future research priorities section below.
The observed weak cultivar-level correlations reflect the multifactorial nature of fruit spectral signatures, which integrate contributions beyond the targeted chlorophylls and polyphenols. Morphological factors play a substantial role: epicarp thickness directly affects light penetration depth and the relative contribution of skin versus mesocarp pigments to measured reflectance, while surface wax composition and cuticle ultrastructure modulate reflectance properties independently of pigment content, introducing cultivar-specific spectral baselines [
34]. Unmeasured biochemical components further complicate spectral–chemical relationships. Anthocyanins exhibit cultivar-specific accumulation kinetics during ripening [
2], with strong absorption at 520–560 nm creating direct interference in the spectral region where many chlorophyll-sensitive VIs operate [
35,
36]. Specific carotenoid fractions and phenolic subclasses—including secoiridoids, flavonoids, and lignans—degrade at cultivar-dependent rates [
7] independently of total chlorophyll or polyphenol trends, decoupling bulk measurements from spectral properties. Physical properties introduce additional complexity: fruit size [
37] affects measurement geometry, while water content variations influence NIR absorption characteristics [
13,
38].
Future work should consider several complementary strategies to address the limitations identified in this study and enhance the operational utility of VI-based phenotyping systems for olive fruit quality assessment. First, preservation of temporal structure in predictive models is essential, potentially through trajectory-based or time-series approaches that maintain stage-specific VI–chemical relationships rather than collapsing variation through cultivar-level averaging. Second, expansion of the chemical analytical panel to include anthocyanins, specific carotenoid fractions, and detailed phenolic class distributions would provide a more comprehensive biochemical characterization capable of explaining the multifactorial nature of fruit spectral signatures. Third, integration of advanced machine learning approaches such as convolutional neural networks [
39,
40,
41] or ensemble methods may capture complex non-linear relationships between spectral features and chemical composition that conventional vegetation indices cannot approximate.
Finally, development of cultivar-specific or cultivar-adaptive models using techniques such as transfer learning or hierarchical modeling could account for genetic variation while maintaining predictive accuracy across diverse germplasm, enabling broader deployment of spectral-based quality monitoring systems in commercial olive production.
5. Conclusions
This study successfully identified vegetation indices capable of tracking biochemcal changes during olive fruit ripening, with important implications for both production systems and breeding programs. The strong within-cultivar temporal correlations demonstrate that VI-based systems can effectively monitor ripening progression and optimize harvest timing, though moderate inter-cultivar variability in VI–chemical relationships necessitates cultivar-specific calibrations for operational use across diverse germplasm.
For breeding applications, the contrasting behaviors between chlorophyll and polyphenol dynamics establish distinct strategic approaches. Chlorophyll content represents a relatively predictable trait with conserved degradation patterns across genotypes, supporting development of universal VI-based tracking models that require validation across additional environmental conditions and harvest seasons. Conversely, polyphenol profiles exhibited marked cultivar-specific diversity, with cultivars displaying positive, negative, or minimal variation throughout maturation.
Future research priorities emerge directly from the temporal aggregation artifacts identified in this study, where strong stage-by-stage correlations collapsed to weak cultivar-level relationships when averaged across ripening stages. VI-based phenotyping systems must preserve ripening trajectory information through time-series approaches or repeated measurements at key developmental stages, enabling extraction of temporal derivatives and trajectory-based features that quantify ripening dynamics rather than instantaneous states.
While this study deliberately employed single-site germplasm screening to isolate genetic effects on spectral–chemical relationships, environmental validation across pedoclimatic gradients is essential before broad deployment in diverse Mediterranean olive-growing regions. Expansion of biochemical panels to include anthocyanins, carotenoids, and detailed phenolic class distributions may further reduce prediction uncertainty and improve cultivar-level discrimination, thereby bridging the current gap between temporal tracking capacity and cross-cultivar chemical phenotyping.