A Biologically Informed Wavelength Extraction (BIWE) Method for Hyperspectral Classification of Olive Cultivars and Ripening Stages

Distefano, Miriam; Avola, Giovanni; Cantini, Claudio; Gioli, Beniamino; Cavaliere, Alice; Riggi, Ezio

doi:10.3390/rs17193277

Open AccessArticle

A Biologically Informed Wavelength Extraction (BIWE) Method for Hyperspectral Classification of Olive Cultivars and Ripening Stages

by

Miriam Distefano

¹

,

Giovanni Avola

^1,*

,

Claudio Cantini

²

,

Beniamino Gioli

³

,

Alice Cavaliere

⁴

and

Ezio Riggi

¹

National Research Council of Italy (CNR)—Institute of BioEconomy (IBE), Via Paolo Gaifami 18, 95126 Catania, Italy

²

National Research Council of Italy (CNR)—Institute of BioEconomy (IBE), 58022 Follonica, Italy

³

National Research Council of Italy (CNR)—Institute of BioEconomy (IBE), Via Madonna del Piano 10, 50019 Sesto Fiorentino, Italy

⁴

Institute of Polar Sciences, National Research Council, 40129 Bologna, Italy

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(19), 3277; https://doi.org/10.3390/rs17193277

Submission received: 31 July 2025 / Revised: 18 September 2025 / Accepted: 21 September 2025 / Published: 24 September 2025

(This article belongs to the Special Issue Advances in Hyperspectral Data Analysis for Vegetation and Soil Monitoring)

Download

Browse Figures

Versions Notes

Abstract

Highlights

What are the main findings?

Hyperspectral analysis across 29 olive cultivars, at different fruit ripening stages, revealed characteristic maturation-related reflectance patterns, including a distinctive peak at ~550 nm during early ripening that shifts to the 700–780 nm range in intermediate and advanced stages, with a plateau phase at 800–950 nm across all samples.
A hyperspectral data analysis method, Biologically Informed Wavelength Extraction (BIWE), involving raw spectral data, related derivates and vegetation indices, has been developed, calibrated, validated and benchmarked in comparison to methodologies (Random Forest, Recursive Feature Elimination with Support Vector Machine, Principal Component Analysis) widely applied in spectral data analysis.

What is the implication of the main findings?

BIWE introduced a novel approach for parsimonious hyperspectral feature selection through the integration of multi-scale spectral analysis with biologically-informed scoring, achieving competitive classification accuracy while significantly reducing the required wavelength dataset to 25 bands compared to conventional methodologies. The BIWE method overpasses the black-box, and merely statistical approach of the conventional methodologies.
The significant reduction in required bands dataset directly impacts technological requirements for sensor design and enables practical real-time classification applications for olive cultivar and ripening stage discrimination.

Abstract

Reliable tools for cultivar discrimination and ripening stage evaluation are critical to optimize harvest timing and support milling process focused on olive oil quality. This research examines the spectral properties of olive drupes throughout different maturation stages, ranging from green to full purple-black pigmentation, across 29 distinct cultivars. High-resolution spectrometric analysis was conducted within the 380–1080 nm wavelength range. Multiple analytical approaches were employed to optimize wavelength selection from hyperspectral reflectance data to obtain discriminating tools for olive classification. A Biologically Informed Wavelength Extraction method (BIWE) was developed, focusing on cultivar and ripening stages identification, and pivoted on biologically informed single wavelengths and Vegetation Indices (VIs) selection. The methodology integrated multi-scale spectral analysis with biochemically weighted scoring and a multi-criteria evaluation framework, employing a two-iteration refinement process to identify optimal spectral features with high discriminatory power and biological relevance. Analysis revealed spectral variations associated with maturation. A characteristic reflectance peak at approximately 550 nm observed during early ripening stages underwent a notable shift, developing into distinct spectral behavior within the 700–780 nm range in intermediate and advanced ripening stages and reaching a plateau for all the samples between 800 and 950 nm. The BIWE method achieved exceptional efficiency in olive classification, utilizing only 25 single wavelengths compared to 114 required by Principal Component Analysis (PCA) and 131 by Recursive Feature Elimination (RFE), representing 4.6-fold and 5.2-fold reductions, respectively. Despite this reduction, BIWE’s overall accuracy (0.5634) remained competitive compared to RFE (−10%) and PCA (−8%) alternative approaches requiring larger wavelengths dataset acquisition. The integration of biochemically relevant VIs enhanced accuracy across all methodologies, with BIWE demonstrating notable improvement (+19.2%). BIWE demonstrated effective olive identification capacity with a reduction in required wavelengths and VIs dataset, affecting the technological needs (spectrometer offset and real-time classification applications) for a tool oriented to olive cultivars and ripening stage discrimination.

Keywords:

olive; biologically informed wavelength extraction; spectral reflectance; VIS/NIR; ripening stage; random forest; recursive feature elimination; PCA; LOO-CV

1. Introduction

Hyperspectral data analysis offers unparalleled capabilities for non-destructive traits’ retrieval and phenotyping in agriculture due to its rich spectral information content [1]. This technology provides fine-scale spectral signatures that capture subtle biochemical and biophysical variations in plant material, enabling detailed characterization and early detection of changes that are imperceptible to the human eye or conventional multispectral imaging. Its potential spans various applications in precision agriculture, from disease detection and nutrient management to yield prediction and cultivar discrimination [2,3,4,5,6].

However, the high dimensionality inherent in hyperspectral datasets presents significant processing challenges. These include, but are not limited to, the substantial collinearity among adjacent spectral bands and the heterogeneous informational content, which can profoundly limit the efficacy of predictive models [7,8,9,10,11]. While in accordance with the end-to-end feature learning paradigms intrinsic to deep learning models, increasing the number of hyperspectral bands might be expected to enhance classification accuracy; this is often not the case in model-based analyses due to the ‘Hughes phenomenon’ [12,13], where the efficiency and effectiveness of algorithms can deteriorate as data dimensionality increases exponentially. Crucially, hyperspectral datasets often contain numerous bands that are dominated by noise or carry information irrelevant to the target variable. Even with sophisticated feature selection techniques, these non-relevant features can dilute the importance of truly informative signals, potentially leading to reduced classification accuracy [14]. Beyond statistical considerations, the substantial computational demands associated with processing vast hyperspectral datasets further necessitate efficient data subset selection methods. To address these issues, various features reduction techniques, encompassing both feature extraction and feature selection methods, have been effectively employed in the classification of hyperspectral data [15,16]. In this context, several studies on VIS/NIR [17] reveal a broad spectrum of methodologies and applications concerning waveband selection, encompassing diverse plant types and ecological systems. Pre-processing steps commonly involve smoothing, normalization, or derivative transformations, which aim to reduce noise and enhance feature interpretability. Feature selection methodologies encompass a diverse spectrum of approaches, ranging from unsupervised dimensionality reduction techniques such as Principal Component Analysis [18] to ensemble-based methods including Random Forest variable importance ranking [19], wrapper approaches like Recursive Feature Elimination with Support Vector Machines [20], and univariate statistical methods including the Two-Sample t-test and Chi-Square test for independence [21].

Unsupervised methods like PCA enable dimensionality reduction and pattern discovery without labeled data, facilitating exploratory analysis and computational efficiency, yet they require post hoc interpretation and lack direct predictive capabilities. Supervised methods achieve high accuracy and quantify feature importance when trained on labeled datasets but suffer from dependency on expensive, representative training data and potential generalization limitations to novel conditions. These fundamental trade-offs underscore a critical challenge that extends beyond accuracy: the inherent ‘black-box’ nature of many high-performing models, coupled with the understanding that the top-ranked features identified by these models may not always represent the truly informative or causally relevant features for a given classification task [22,23,24].

This discrepancy can stem from several factors: multicollinearity, where models may arbitrarily prioritize one of several equally informative correlated features, leading to unstable importance metrics [25]; the presence of complex interaction effects, where individual feature importance metrics fail to capture synergistic relationships critical for prediction [26]; model dependency, as feature rankings are often specific to the algorithm used [27,28]; sample size limitations relative to feature dimensionality, particularly problematic in high-dimensional spectroscopic data where insufficient samples can lead to unstable feature selections [29]; noise and measurement artifacts that can elevate the apparent importance of irrelevant features [30]; and the risk of overfitting, where features deemed important on training data do not generalize well to unseen samples, indicating a failure to learn true underlying patterns [31]. This underscores the need for robust interpretability, allowing researchers to understand not just what a model predicts, but why, and to ensure its utility beyond mere performance metrics.

The relationship between discrete spectral intervals or individual bands and quantifiable leaf structural and chemical properties is well documented, creating a framework that has traditionally informed feature selection strategies for species identification and plant type or tissue discrimination [32,33,34,35,36,37]. Beyond enhancing model performance and computational efficiency in regression analyses, the selection of targeted spectral regions facilitates elucidation of the fundamental relationships between spectral signatures and leaf or canopy optical properties, while simultaneously reducing interference from confounding secondary spectral responses [38].

Recent advances in olive (Olea europaea L.) cultivar discrimination using hyperspectral imaging have shown promising results, either focusing on spectral analysis of leaves or canopy [4], on extra virgin olive oils produced from distinct cultivars and different maturation stages [39], or on fruit characteristics [40,41]. Spectroscopic approaches have demonstrated exceptional performance in olive-related applications, with Laser-Induced Breakdown Spectroscopy (LIBS) and absorption spectroscopy achieving up to 100% classification accuracy for geographical origin discrimination of olive oils using machine learning algorithms [42], and UV-Vis spectroscopy accomplishing 98.7% accuracy for variety classification, 89.0% for sensorial quality assessment, and 98.4% for origin classification using chemometric data processing methods [43]. However, to the best of our knowledge, the combined use of hyperspectral data to discriminate cultivars and ripening stages has not been attempted for olive drupes. Building on this foundation, the application of waveband-based feature selection strategies has gained relevance in olive, where spectral variability reflects both genotypic and phenological differences [39]. In this view, accurate olive cultivar classification represents substantial economic and scientific value within the olive oil industry. Concurrently, accurate characterization of fruit maturation stages is essential for optimizing harvest timing, with direct implications for oil yield, quality parameters [44], harvest efficiency, and post-harvest processing.

The ability of hyperspectral data analysis to reveal “invisible” biochemical and structural traits is particularly valuable for olive cultivars, which often exhibit minimal visual differences but possess distinct chemical profiles critical for accurate identification [36,37]. However, the application of such techniques is complicated by olive-specific challenges: subtle inter-cultivar spectral variations may be confounded by phenological changes or masked by within-cultivar heterogeneity [4]; moreover, substantial intra-cultivar variability—driven by factors such as sun exposure, canopy position, and the continuous nature of ripening—can challenge the disentangling of discrete maturation stages. Thus, the development of interpretable and resilient models capable of resolving these nuanced spectral differences and accurately tracking fruit development is essential to enhance olive production and ensure product quality and authenticity [30].

Given these challenges and the pressing need for robust, interpretable models suited for practical applications in olive production, this study introduces a Discriminatory Index algorithm that integrates multi-scale spectral feature detection with biologically informed scoring to identify optimal wavelengths for drupes classification. The proposed methodology incorporates a multi-criteria evaluation framework—assessing discriminatory power, phenological consistency, and detection reliability across maturation stages—to select a parsimonious set of spectrally and biologically meaningful wavelengths. Its performance was benchmarked against established statistical methods, including Principal Component Analysis (PCA), Recursive Feature Elimination with Support Vector Machines (RFE-SVM), and Random Forest (RF).

2. Materials and Methods

2.1. Plant Material and Olive Sampling

Olive drupes were collected from the experimental farm “Santa Paolina” of the Italian National Research Council (CNR) located in Follonica, (42°56′39″N, 10°46′16″E, 38 m above sea level—Central Italy). Twenty-nine olive cultivars were selected for this work (Table 1, Figure S1).

While cultivated at a single location to control environmental variables, the 29 cultivars represent genetic diversity from major olive-producing regions, including Spain, Turkey, Croatia, France, and diverse Italian regions. Drupes were collected from different positions in a single tree at different fruit ripening stages, from early October to the end of November 2024. Olive sampling included four representative ripening stages (M1-M2-M3-M4), simultaneously present on the tree, and each olive was evaluated by a non-destructive visual assessment of the ripening stage, considering only the skin color as reported in Table 2, according to Alamprese et al. [44]. Ten olives, homogeneous and with no visible defects, for each ripening stage were selected for the reflectance measurement. To minimize the effects of uneven fruit turning progression, the light reflectance was measured at two different positions on each drupe: at the apex and at the stylar end, for a total of 2320 (29 cultivars × 4 ripening stages × 10 drupes × 2 points) spectra acquired.

2.2. Instrumentation

Spectral reflectance profiles were acquired using an HR2 spectrometer (Ocean Optics, Orlando, FL, USA) operating in the 380–1080 nm wavelength range, with a spectral resolution of 0.46 nm. The system comprised a 45° diffuse reflectance probe (DR-Probe, Ocean Optics) equipped with an integrated tungsten–halogen light source, a fiber optic cable with a 6 μm core diameter, and a 40 mm stand-off block to maintain a fixed focal distance and provide optical isolation during measurements. In addition, a set of custom-made masks was employed to accommodate different drupe sizes, ensuring that the acquired spectral data exclusively represented the reflectance of the olive drupes. Each mask was coated with a highly pigmented, ultra-matte black acrylic paint capable of absorbing up to 98% of visible light. To mitigate instrumental noise in the spectral measurements, a dark current correction was performed following each cultivar change. A white reference measurement using a surface of known reflectance was acquired every five samples to normalize olive radiance data and convert it into percent reflectance values, thereby mitigating the effects of illumination variability and sensor response fluctuations.

2.3. Wavelength Selection Methodologies in Hyperspectral Analysis for Olive Classification

Spectral reflectance data from olive drupes (10 samples × 2 positions × 4 ripening stages × 29 cultivars) were analyzed using R (version 4.4.2) to identify key wavelengths associated with fruit maturation stages through an iterative process and benchmarked against three established statistical feature selection methods, which served as reference tests.

2.3.1. Biologically Informed Wavelength Extraction (BIWE)

An iterative algorithm was developed to identify optimal spectral bands for olive cultivar classification through biologically informed wavelength selection. The algorithm operated through a multi-step iterative process with adaptive threshold optimization and multi-scale analysis, validated using Leave-One-Out Cross-Validation (LOO-CV) to achieve reliable olive cultivar classification.

Step 1—Spectral Data Preprocessing. For each analyzed olive, 1522 raw reflectance values (380–1080 nm; 0.46 nm spectral resolution), were obtained as spectral signature. Hyperspectral data preprocessing involved Savitzky–Golay filtering for noise reduction, smoothing data to 2 nm spectral resolution, so obtaining 761 wavelengths for each signature, followed by computation of first and second derivatives (Figure 1).

Leave-One-Cultivar-Out Cross-Validation for BIWE Feature Validation

Step 2—Threshold Optimization via Grid Search. The wavelength selection needs a thresholds’ detection process selecting thresholds able to achieve optimal trade-off between detection sensitivity and instrumental noise artifacts. The algorithm acts through iterative evaluation of 27 threshold combinations concerning raw reflectance (0.0005, 0.001, 0.002), first derivative (0.003, 0.005, 0.008), and second derivative (0.003, 0.005, 0.008). Threshold selection was systematically determined through preliminary signal-to-noise analysis of the hyperspectral dataset. The lower bounds (0.0005 for raw reflectance, 0.003 for derivatives) were established at 2× the measured instrumental noise floor (noise = 0.00025 for raw spectra, noise = 0.0015 for derivatives), ensuring detection above random fluctuations. The upper bounds were set at the 95th percentile of measured peak prominences across the dataset, preventing oversensitivity to outliers while maintaining biological feature detection capability. The intermediate values provide logarithmic spacing for comprehensive threshold space exploration. Each combination undergoes performance assessment using a composite metric that evaluates discriminatory power across the entire dataset, with the highest-performing threshold set selected for subsequent analysis.

Step 3—Multi-Scale Peak Detection Framework. On the smoothed raw reflectance data, and related derivates, the algorithm employs the findpeaks function from the R statistic SW pracma package to systematically identify spectral peaks and valleys across three resolution scales: fine (min_distance = 1 nm), medium (min_distance = 2 nm), and broad (min_distance = 3 nm). This multi-scale approach extracts peaks and valleys intensity and related wavelength position, peaks and valleys width, scale type, and associates these with the corresponding sample cultivar and ripening stage.

Step 4—Biological Relevance Scoring System. A biologically informed scoring system assigns weights based on established olive maturation biochemistry:

Maximum priority (1.00): Red-edge region (680–780 nm)—critical for photosystem degradation detection.
High priority (0.95): Chlorophyll absorption bands (400–500 nm, 640–680 nm), carotenoid regions (500–550 nm), and water absorption bands (940–980 nm)—directly linked to maturation processes.
Moderate priority (0.75): Near-infrared structural regions (780–900 nm)—reflecting cell wall modifications.
Default priority (0.50): All other spectral regions with limited biochemical significance.

Step 5—Six-Component Discriminatory Analysis. The algorithm integrates six normalized performance metrics through weighted combination:

Cultivar discrimination power (25%): Coefficient of variation across cultivars.
Detection reliability (20%): Frequency of feature detection across samples.
Stage consistency (20%): Temporal stability calculated as 1 − (σ/μ) across M1–M4 ripening stages.
Balanced discrimination (15%): Shannon diversity index ensuring equal representation.
Biological relevance (15%): Integration of physiological scoring system.
Multi-scale consensus (5%): Agreement across fine, medium, and broad detection scales.

Step 6—Two-Iteration Progressive Refinement.

Iteration 1—Broad Candidate Identification: The algorithm performs comprehensive spectral analysis using medium and broad resolution scales with optimized thresholds. Candidate wavelengths are selected based on quality score performance above the 60th percentile, creating a refined pool of biologically and statistically relevant bands.

Iteration 2—Focused Precision Enhancement: Around each selected candidate from Iteration 1, the algorithm creates focused ±8 nm analysis windows. Detection thresholds are increased by 30% to enhance precision and reduce false positives. This targeted approach refines the initial selection while maintaining computational efficiency.

Step 7—Final Selection Criteria. The ultimate wavelength selection requires simultaneous satisfaction of three stringent arbitrary criteria:

Enhanced Quality Score > 50th percentile (ensure selection of above-median performing features, representing the upper half of the discriminatory performance distribution).
Weighted Detection Rate > 0.1 (assumed as the minimum detection frequency necessary for robust cross-validation performance).
Weighted Bio Score > 0.4 (ensuring selection of bands from spectral regions with assigned biological relevance above the default baseline score (Default priority 0.5), effectively limiting wavelengths from biochemically non-informative spectral regions).

This multi-criteria approach ensures that selected wavelengths represent both statistically powerful and biologically meaningful spectral features for olive maturation assessment.

To assess generalizability and mitigate cultivar-specific overfitting risks, BIWE-selected features underwent rigorous Leave-One-Cultivar-Out Cross-Validation (LOO-CV) using Random Forest classifiers for both cultivar discrimination and ripening stage classification tasks. This approach systematically held out each of the 29 cultivars as an independent test case while performing feature selection on the remaining 28 cultivars, ensuring complete genetic separation between training and validation phases. This iterative process ensured absolute separation between training and validation datasets across all cross-validation folds, eliminating potential data leakage artifacts that could artificially inflate classification performance estimates. For each tested cultivar, spectral feature extraction was performed separately for each ripening stage (M1–M4) using the BIWE algorithm. Following BIWE wavelength selection, the validation framework constructed features identifiers, combining derivative types with selected wavelengths. Automatic feature matching ensured exact correspondence between BIWE-selected wavelengths and available spectral data columns. A dimensionality-reduced dataset was created containing exclusively BIWE-selected features alongside cultivar and ripening stage metadata, representing the core validation of feature selection efficacy.

For each LOO-CV iteration, Random Forest classifiers were trained using optimized parameters (ntree = 500) selected for robustness with small sample sizes and ability to handle correlated features. The prediction phase utilized probabilistic classification to extract class membership probabilities for all target classes, with final predictions corresponding to maximum probability classes. Complete probability matrices were stored for uncertainty quantification and posterior analysis. A complete LOO-CV was executed for cultivar discrimination across all 29 cultivars, storing predicted classes and probability matrices (2320 × 29). Parallel LOO-CV analysis for ripening stage classification generated independent prediction vectors and probability matrices (2320 × 4), assessing the temporal discriminatory power of BIWE-selected wavelengths across the complete maturation gradient. LOO-CV results showed stage-specific performance metrics for each cultivar-ripening stage combination. Key metrics included: mean discriminatory score, quality score, stage consistency, features count, detection rate, and intensity coefficient of variation. The mean discriminatory score was selected as the primary reporting metric as it represents the composite output of the BIWE algorithm, integrating all weighted components into a single interpretable measure of wavelengths discriminatory performance.

Finally, a cross-cultivar consistency was quantified for each ripening stage as 1 minus the coefficient of variation in discriminatory scores across all cultivars, providing genetic background-independent reliability measures. A composite biological validation score incorporated pattern consistency (0.25 weight), stability measures (0.20), ranking correlations (0.25), and biological trend alignment (0.30). Validation scores > 0.7 indicated strong biological validity, while scores < 0.3 suggested potential statistical artifacts.

2.3.2. Random Forest

A Random Forest (RF) implementation was developed utilizing Out-of-Bag (OOB) error optimization for both hyperparameter tuning and wavelength selection [45]. The OOB error provides an unbiased estimate of model performance by leveraging the approximately 37% of samples not used in each bootstrap iteration during forest construction, eliminating the need for separate cross-validation procedures. The optimization process employed a systematic grid search across multiple hyperparameter combinations, testing ntree values (100, 300, 500 trees) and mtry values automatically determined based on dataset dimensionality, using established heuristics: √p (default classification), p/3 (conservative), p/2 (moderate), log₂(p) (alternative default), and p/10 (very conservative), where p represents the number of wavelengths. For each parameter combination, models were trained with importance scoring enabled, and OOB error was calculated and validated for numerical stability. The optimal hyperparameter configuration was selected based on the minimum OOB error across all tested combinations.

Spectral wavelength selection was implemented through three complementary strategies: (1) importance ranking, selecting the top N wavelengths based on Mean Decrease Gini scores; (2) cumulative importance, selecting wavelengths contributing to 95% of total variable importance; and (3) OOB incremental selection, employing forward selection with early stopping based on OOB error progression. The final model was trained using the selected wavelengths subset with optimized hyperparameters, providing both enhanced predictive performance and reduced computational complexity.

2.3.3. Recursive Feature Elimination with Support Vector Machine

Recursive Feature Elimination (RFE) was implemented using Support Vector Machines (SVM) with linear kernels as the underlying classification algorithm to identify the most discriminative spectral wavelengths through systematic backward selection [46]. The RFE procedure employed a 5-fold cross-validation with stratified sampling to ensure robust wavelengths ranking while maintaining computational efficiency. The algorithm iteratively trained SVM models on progressively reduced wavelength subsets, ranking wavelengths based on their contribution to the decision boundary through examination of weight coefficients in the linear kernel space. Features elimination proceeded recursively, removing the least important wavelength at each iteration until optimal subset sizes were achieved.

The optimization process evaluated multiple target wavelength subset sizes (15, 30, and 50 wavelengths) to identify the configuration yielding maximum cross-validated overall accuracy. The caretFuncs function was utilized to ensure consistent model training and evaluation procedures across iterations, with automatic hyperparameter optimization for the underlying SVM classifier. The final wavelengths subset was selected based on the highest mean cross-validated accuracy across all folds.

2.3.4. Principal Component Analysis

Principal Component Analysis (PCA) was employed for dimensionality reduction and wavelength selection through the transformation of the original spectral space into an orthogonal representation capturing maximum variance [47].

Wavelength selection was implemented through a two-stage process: first, the number of principal components required to explain 95% of the total spectral variance was determined to establish the effective dimensionality of the dataset. Subsequently, original spectral wavelengths were selected based on their absolute loadings (coefficients) in the retained principal components, using a threshold of 0.1 to identify wavelengths contributing meaningfully to the principal component space. For each original wavelength, the maximum absolute loading across all retained components was computed, and wavelengths exceeding the loading threshold were retained for classification.

2.4. Wavelength Selection Methodologies Comparative Analysis

A comparative evaluation of the above-mentioned wavelength selection methodologies was conducted to assess the performance of different approaches for hyperspectral data classification. Each selection method generated distinct wavelength subsets that were subsequently evaluated using identical classification protocols to ensure methodological consistency (Table S1). To ensure fair and consistent comparison across all methodologies, Support Vector Machine with linear kernel (svmLinear) was employed as the classification algorithm for all selection approaches. Model performance was evaluated using 5-fold cross-validation to provide reliable and robust statistical estimates. The evaluation included both bulk analysis (combining all maturation stages M1–M4) and stage-specific analysis to assess method performance across different levels of fruit ripeness. Performance assessment employed a three-metric approach combining predictive accuracy, F1-score with technological efficiency considerations relevant to practical spectroscopic implementation. Overall accuracy (OA) was defined as the proportion of correctly classified samples (number of correct predictions divided by the total number of samples). This metric provides a direct measure of method reliability, with values ranging from 0 (no correct classifications) to 1 (perfect classification).

The F1-score was calculated as the macro-averaged harmonic mean of precision and recall across all cultivar classes, providing a balanced assessment of classification performance that accounts for both false positives and false negatives. The macro-averaged approach ensures equal weighting of all cultivars regardless of sample distribution, making it particularly suitable for evaluating performance across the 29 genetically diverse cultivars in this study. F1-score values range from 0 to 1, where higher values indicate better balanced precision–recall performance.

Technological efficiency was quantified as the accuracy-to-features ratio (overall accuracy divided by the number of selected wavelengths), providing a normalized metric that balances predictive performance against instrumental complexity. This efficiency metric addresses practical deployment constraints where spectrometer design costs, optical complexity, and measurement speed are directly proportional to the number of discrete wavelengths required. This tri-metric evaluation ensures that wavelength selection methods are assessed not only for statistical performance (OA and F1-score) but also for their practical utility in developing cost-effective, streamlined spectroscopic instruments suitable for routine agricultural implementation. The comparative analysis was also conducted including a dataset of Vegetation Indices (VIs) selected through a systematic literature review targeting indices specifically relevant to olive fruit maturation physiology (Table 3). The selection prioritized indices sensitive to key biochemical processes during ripening: chlorophyll degradation (NDVI, PRI, Gitelson indices, LCI variants), carotenoid dynamics (NPCI, SIPI, SRPI), and structural changes (WI). Multiple indices targeting similar biochemical processes were deliberately included to ensure comprehensive coverage of potential spectral responses and identify the most effective formulations for olive maturation assessment. This VI dataset was integrated with the four wavelength selection methodologies to evaluate potential enhancement effects on classification performance.

3. Results

3.1. Spectral Characteristics of Different Maturation Stages

Figure 2 illustrates the spectral reflectance curves (A), 1st (B) and 2nd (C) derivative signals of genotype “Dolce d’Andria” olive drupes as a representative example of the ripening time function (M1–M4), covering a wavelength range from 380 to 1080 nm. The curves reveal significant spectral changes across maturation stages, particularly in the visible (500–700 nm) and near-infrared (700–980 nm) regions. The spectral reflectance curve of M1 (light green) exhibits a distinct peak around 550 nm, which corresponds to the green reflectance region. This feature is associated with chlorophyll reflection, where plant tissues absorb light strongly in the blue (~450 nm) and red (~670 nm) regions but reflect more in the green (~550 nm) region, giving unripe olive drupes their characteristic green appearance. The intensity of this peak is notably higher in M1 compared to the later stages (M2, M3, and M4), suggesting a higher chlorophyll concentration in the early maturation phase. The reflectance peak and the valley of the spectrum are located at wavelengths corresponding to the minimum (550 nm) and maximum (670 nm) absorption wavelengths of chlorophyll. The spectral curves reveal a progressive reduction in reflectance at 550 nm as olives ripe from M1 (early stage) to M2 and M3 (intermediate and advanced stages, respectively).

This reduction is indicative of chlorophyll degradation, which leads to a diminished green reflectance and signals the onset of ripening. Simultaneously, a distinct peak emerges around 650 nm in M2 and becomes more pronounced in M3. This feature is attributed to the accumulation of carotenoids and anthocyanins, pigments that gradually replace chlorophyll as the fruit matures. The increased reflectance in this region suggests structural and biochemical changes in the fruit skin, including modifications in pigment composition that influence light absorption and scattering properties. Moreover, M2 and M3 exhibit well-defined peaks between 700 and 750 nm, which are more evident compared to the earlier stage.

The spectral reflectance curve of M4 (full maturation, purple line) exhibits a notable absence of distinct peaks in the 500–650 nm range, indicating a near-complete degradation of chlorophyll pigments. In contrast to earlier maturation stages (M1, M2, and M3), where chlorophyll-related reflectance features were prominent, the lack of spectral variability in this region suggests a dominance of anthocyanins and other pigmentation changes that mask chlorophyll reflection effects. Conversely, M4 shows spectral flattening over 820 nm in the near-infrared range, suggesting structural modifications in the epidermal and mesocarp layers, likely due to changes in cell wall composition and water content that occur during full maturation.

3.2. Wavelength Selection Methodologies in Hyperspectral Analysis for Olive Classification

3.2.1. Biologically Informed Wavelength Extraction

Table 4 summarizes the 25 wavelengths selected by the Biologically Informed Wavelength Extraction algorithm, which integrates biological weighting, developmental stage consistency, and robustness detection. The Discriminatory Scores ranged from 1.28 to 3.02, with all features surpassing the quality threshold (Quality Score > 1.0), indicating statistically sound and biologically relevant selections.

The highest-ranked wavelength was 680 nm (D₂ Peak), exhibiting a discriminatory score of 3.02, coupled with strong biological weight (0.855), high detection rate (12.7), and consistent performance across developmental stages. Its position within the red edge region highlights its physiological relevance, likely reflecting dynamic changes in chlorophyll concentration and photosystem structure. Other prominent wavelengths include 695 nm (D₁ Peak) and 705 nm (D₂ Valley), both situated within the red edge domain, reinforcing the importance of this spectral interval for distinguishing between drupes based on photosynthetic pigment content and their degradation kinetics. Notably, 950 nm (D₁ Valley) and 970 nm (D₂ Peak) also scored highly across all evaluation metrics, supporting their role as a critical near-infrared marker. This region, typically associated with internal leaf structure and water content, may reflect underlying genotype-dependent differences in mesophyll organization or ripening-related cell wall transformations.

3.2.2. Random Forest Feature Selection

The Random Forest (RF) algorithm with Out-of-Bag (OOB) error optimization employed for feature selection selected 50 wavelengths (Table S1). The top-ranked wavelengths, particularly 675 (38.84), 670 (36.39), 680 (35.43), located in the red and red-edge regions of the spectrum, and 405 (36.23), 410 (32.86), 415 (32.07), in the blue-green region, exhibit high Mean Decrease Gini scores. Several wavelengths in the short-wave NIR region (e.g., 980, 975, 990, 985) appear in the top 20, suggesting their importance.

3.2.3. Recursive Feature Elimination with Support Vector Machine

The RFE process was configured using 5-fold cross-validation, and feature subsets of sizes 15, 30, and 50 were iteratively evaluated to identify the optimal set of predictors that maximized model performance. It selected 131 wavelengths (Table S1). Similarly to the Random Forest results, the RFE-SVM method heavily prioritizes wavelengths within the red and red-edge spectral range (675, 680, 670, 685, 665, 690). A substantial number of wavelengths from the blue-green part of the spectrum (e.g., 440, 445, 435, 430, 450, 425, 405, 410, 455, 415, 460, 465, 470, 475) are also consistently selected. The selection includes a dense cluster of wavelengths in the short-wave NIR region (965, 970, 975, 960, 980, 985, 990, 1000, 995, 1005, 1010, 1015, 1020, 1025, 1055, 1050, 955, 950, 945, 940, 935), and also retained a considerable number of wavelengths between 800 nm and 930 nm (e.g., 850, 865, 845, 860, 870, 900, 825, 820). While less dominant than the red/red-edge or blue regions, several wavelengths in the green-yellow range (e.g., 625, 620, 615, 610, 605) were also selected. The comprehensive list indicates that RFE with SVM effectively identifies a diverse set of informative wavelengths spanning visible and near-infrared regions.

3.2.4. Principal Component Analysis

The PCA-based selection yielded a distinct set of 114 wavelengths (Table S1), based on their significant loadings (absolute loading > 0.1) onto the principal components that collectively explain 95% of the total variance in the raw spectral data. Unlike methods that prioritize individual highly discriminative peaks (like Random Forest or RFE with SVM), PCA tends to select broader, more continuous segments of the spectrum. This was evident in the extensive ranges observed. A nearly continuous block of wavelengths from the visible light spectrum extending into the initial NIR region (from 405 to 735) was selected. A distinct segment in the mid-NIR (775 to 840) is also present. Another significant and continuous block is observed in the short-wave NIR (930 to 1055).

3.3. Wavelength Selection Methodologies Comparative Analysis

The comparative evaluation of feature selection methodologies using svmLinear revealed distinct performance characteristics across the tested approaches (Table 5). RFE achieved the highest accuracy (0.664), but required extensive feature sets (131), resulting in moderate technological efficiency (0.0051). PCA demonstrated competitive accuracy (0.641) with similar efficiency (114 wavelengths, efficiency: 0.00562). BIWE achieved a lower overall accuracy (0.5634), but with only 25 wavelengths, representing a 6-fold reduction in feature dimensionality compared to the best-performing method. RF exhibited the poorest performance profile with a lower accuracy (0.544), despite substantial feature requirements (50 wavelengths, technological efficiency: 0.01088).

The integration of vegetation indices consistently enhanced performance across all methodologies, with BIWE demonstrating a notable improvement (0.671), representing an absolute improvement of +0.108 (19.2% relative increase). This biologically informed approach demonstrated efficiency values 4–5 times superior to other studied methodologies, indicating fundamental differences in the feature selection approach. RFE and PCA, despite achieving the highest absolute accuracies, exhibited modest improvements (+0.038 and +0.051, respectively), suggesting that these statistical approaches had already captured much of the discriminatory information available in the spectral domain alone.

Moreover, the superior performance of the BIWE methodology in single maturation stages compared to bulk analysis represented a significant finding that elucidates the biological relevance of the proposed wavelength selection approach. The method achieved substantially higher accuracies in stage-specific classification (M1: 0.6793, M2: 0.6621, M3: 0.6552, M4: 0.6862) versus bulk analysis (0.5634), indicating a 17–22% performance improvement when applied to physiologically homogeneous samples.

The average F1 scores provide complementary classification quality assessment beyond overall accuracy metrics. While accuracy and F1 demonstrated consistent trends, the cultivar-stage F1 matrix (Table 6) reveals substantial heterogeneity in classification performance, with several accessions demonstrating persistent discrimination challenges across all maturation stages. At the high-performance end, Maurino (F1 range: 0.803–0.948), Leccino (peak F1: 0.925), and XXXVI (peak F1: 0.938) demonstrate robust spectral signatures with consistent discrimination across multiple maturation stages. Conversely, persistently challenging cultivars such as Carboncella (F1 range: 0.181–0.435), Raccioppella (F1 range: 0.341–0.584), and XVII.87 (F1 range: 0.420–0.747) exhibit fundamental spectral similarity with other accessions that precludes reliable classification regardless of physiological state. These differential challenges indicate that universal classification protocols may inadequately address cultivar-specific spectral characteristics, necessitating targeted analytical approaches that account for both exceptional performers and problematic accessions within comprehensive identification frameworks.

Stages M2 and M3 demonstrate convergent optimal discrimination properties across the cultivar spectrum. Both stages exhibit elevated mean F1 performance (M2: 15 cultivars > 0.7; M3: 12 cultivars > 0.7) and reduced inter-cultivar variance compared to early (M1) and late (M4) maturation phases. This convergence suggests that intermediate ripening stages correspond to maximal biochemical differentiation among cultivars, likely reflecting peak metabolic activity periods when cultivar-specific biosynthetic pathways generate distinct spectral signatures.

3.4. LOO-CV Analysis on Drupes Classification with BIWE Method

LOO-CV analysis of a single ripening stage revealed that M2 and M3 generally provided the highest spectral separability among cultivars (Table 7).

Cultivars such as Leccino, Maurino, and Dolce d’Andria reached peak scores during these intermediate stages, with values exceeding 0.74, indicating a high degree of spectral distinctiveness. This pattern suggests that the biochemical and structural differences between cultivars become most pronounced during the mid-ripening phases, likely reflecting cultivar-specific dynamics in pigment accumulation, water content, and tissue composition. Stage M1, while still informative for some cultivars (e.g., Maurino, Leccino), tended to produce slightly lower discriminatory scores on average, possibly due to greater physiological similarity in the early developmental stage. Conversely, stage M4 showed a decline in discriminatory power across many cultivars, consistent with a convergence of ripening-related traits as fruits reach full maturity.

Notably, several cultivars (i.e., Carboncella, Oblonga, and XXXVI) consistently displayed low discriminatory scores across all stages, highlighting limited spectral contrast with other accessions and potential challenges for classification. These findings confirm that M2 and M3 offer the most favorable windows for hyperspectral discrimination and provide support for the strategic selection of ripening stages, thereby enhancing overall accuracy and cultivar traceability.

To reinforce the stage-related trends observed in the discriminatory score analysis, a supplementary validation procedure was implemented to distinguish true biological signals from potential statistical artifacts, generating a synthetic validation metric that integrates discriminatory stability with biological plausibility. A validation score of 0.725 indicates a high degree of alignment between the observed discriminatory patterns and known physiological ripening dynamics in olives. This score reflects the presence of cultivar-specific biochemical or structural traits that become increasingly distinct during intermediate ripening phases. In parallel, the statistical artifact probability was estimated at 0.175, suggesting that less than 18% of the observed variation could be attributed to random fluctuations or overfitting effects. This low value supports the robustness of the findings and confirms that the identified superiority of M2 and M3 is not a consequence of noise or data imbalance.

4. Discussion

The inherent characteristics of hyperspectral datasets present significant analytical challenges, particularly the presence of spectral noise, redundant information, and substantial inter-band correlations across adjacent wavelengths. The past decade has witnessed remarkable advances in hyperspectral feature selection methodologies for cultivar discrimination, driven by the convergence of agricultural domain knowledge with sophisticated machine learning algorithms [39].

The BIWE methodology developed in this study addresses these fundamental challenges by achieving competitive accuracies with a substantially reduced feature set, utilizing only 25 spectral features compared to 114 features required by Principal Component Analysis (PCA) and 131 features employed by Recursive Feature Elimination with Support Vector Machines (RFE_SVM).

Crucially, the integration of scoring criteria based on biological relevance ensured that selected wavelengths corresponded to physiologically meaningful plant processes, rather than statistical artifacts. This biologically informed selection strategy enhanced the interpretability and robustness of the model, aligning with findings by Jacquemoud et al. [61], who highlighted the importance of incorporating prior knowledge of plant biophysics in spectral analysis. Their conclusions demonstrated that distinct canopy biophysical properties dominate reflectance variability in specific spectral domains—for example, chlorophyll content alone accounts for approximately 60% of reflectance variation within the visible range (400–700 nm). Similarly, Kuzudisli and coworkers [62] demonstrated that incorporating domain-specific biological knowledge into the feature selection process yields substantial improvements across multiple performance metrics: enhanced overall accuracy, reduced computational running time, and increased stability of selected feature subsets, thereby establishing the efficacy of knowledge-informed approaches over purely data-driven methodologies.

The comparative results presented in Table 5 further demonstrate the substantial impact of incorporating vegetation indices on performance, with BIWE exhibiting the most significant improvements. The method’s superior responsiveness to vegetation index integration indicates that biologically informed feature selection creates synergistic interactions between spectral features and vegetation indices, whereas the comparative methods (RF, RFE-SVM, and PCA) may exhibit redundancy between these information sources. This finding is consistent with Fei et al. [63], who reported that coupling their feature selection approach with VIs outperformed established methods—including Boruta, FeaLect, and RReliefF—across multiple growth stages, achieving mean R² values ranging from 0.648 to 0.679.

Moving beyond feature selection efficiency, the validation framework employed in this study addresses critical limitations in traditional cross-validation approaches. Traditional k-fold cross-validation presents fundamental inadequacies for cultivar discrimination datasets, where samples from the same genetic background may appear in both training and testing partitions, leading to inflated performance estimates and reduced cross-cultivar generalizability [64]. To address these limitations, this study implemented a Leave-One-Cultivar-Out Cross-Validation model that ensures complete separation between training and testing phases, providing unbiased estimates of performance across genetically diverse backgrounds. The LOO-CV analysis demonstrated biological validation, achieving 73% confidence with <18% statistical artifact probability.

The validation results reveal that ripeness stages M2 and M3 offer the most favorable windows for hyperspectral discrimination, supporting strategic selection of ripening stages to enhance overall accuracy and cultivar traceability.

The superior performance of BIWE methodology in individual maturation stages compared to bulk analysis can be attributed to three interconnected biological and methodological factors: (1) physiological homogeneity within individual stages that reduces intra-stage spectral variability, (2) biochemical specificity of stage-dependent metabolic processes that enhance cultivar discrimination, and (3) enhanced biological wavelength relevance where selected features align with biochemically meaningful processes. This stage-specific advantage is supported by Zahidi et al. [65], who demonstrated that stage-specific hyperspectral feature selection in strawberries and tomatoes, focused on pigment and chlorophyll band extremum positions, delivers markedly improved performance when applied to homogeneous ripening phases. However, as noted by Doktor et al. [66], while machine learning methods can perform adequately across heterogeneous growth stages, stage-specific approaches achieve superior prediction accuracy by reducing within-class variability, though this may come at the cost of reduced model transferability to operational settings where developmental heterogeneity is inherently higher.

The demonstrated advantages of BIWE extend beyond laboratory validation to practical implementation considerations. The method’s ability to maintain high overall accuracy with only 25 selected features, coupled with its responsiveness to physiologically relevant indices, offers significant advantages for actual drupe classification based on spectrometer optimization and real-time classification applications.

Future validation across multiple geographic sites would strengthen generalizability, though our cultivar collection already encompasses significant genetic diversity from major olive-producing regions. Approaches that integrate parsimonious feature selection with biological relevance, as demonstrated in this study, offer a promising avenue for promoting the translation of hyperspectral methods into operational agricultural tools. This is particularly crucial given that the inherent “black box” nature of deep neural networks presents significant challenges for agricultural applications, where model interpretability is essential for stakeholder acceptance and regulatory compliance [20]. The BIWE methodology addresses this challenge by providing both high performance and biological interpretability, supporting the broader adoption of precision agriculture technologies in olive production systems.

5. Conclusions

The study successfully developed and validated a biologically informed Wavelength Extraction method that integrates multi-scale spectral wavelength detection with biologically informed scoring for olive classification. The proposed approach, achieving competitive classification accuracies, is establishing a new paradigm for parsimonious feature selection in hyperspectral agricultural applications. The integration of biological relevance weighting was fundamental to the methodology’s success, ensuring that selected wavelengths correspond to biologically meaningful processes rather than statistical artifacts.

The methodology’s responsiveness to vegetation indices integration demonstrates the synergistic potential between biologically informed feature selection and established spectral metrics. This integration strategy offers promising avenues for translating hyperspectral methods into practical agricultural tools, addressing the persistent challenge of “black box” model limitations through enhanced transparency and biological interpretability.

Performance evaluation across multiple metrics demonstrates the methodology’s effectiveness in achieving competitive classification accuracies with substantially reduced feature requirements compared to conventional statistical approaches. The biological wavelength selection method exhibited superior technological efficiency while maintaining robust discrimination capabilities, particularly through stage-specific protocols that significantly outperformed bulk analysis. Methodological validation confirmed the biological plausibility of the approach while minimizing statistical artifacts, supporting the reliability of the biologically informed feature selection strategy.

Future research should focus on validating the methodology across diverse environmental conditions and olive-growing regions, and real-world agricultural datasets that are often characterized by class imbalances, to establish comprehensive generalizability and robustness, as performance on minority classes may be overestimated when evaluated exclusively on balanced laboratory datasets.

Practical implementation pathways include integration into sorting systems of olive oil mill processing farms, portable devices for harvest timing optimization, and mobile platforms for maturation assessment across diverse production scales. Additionally, integration with emerging technologies such as unmanned aerial systems and real-time processing frameworks could facilitate the transition from laboratory-based proof-of-concept to field-deployable precision agriculture solutions.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs17193277/s1, BIWE_script.r (code); Table S1: Wavelength Subsets Identified by BIWE, Random Forest, RFE-SVM, and PCA Methods; Figure S1: The twenty-nine olive cultivars studied in this work.

Author Contributions

Conceptualization, G.A., E.R. and M.D.; methodology, G.A., E.R. and M.D.; software, G.A. and A.C.; validation, G.A. and A.C.; investigation, G.A., E.R., C.C., B.G. and M.D.; data curation, G.A. and M.D.; writing—original draft preparation, G.A., E.R., B.G., C.C. and M.D.; writing—review and editing, G.A., E.R., B.G., C.C., A.C. and M.D.; funding acquisition, E.R. All authors have read and agreed to the published version of the manuscript.

Funding

European Union Next-GenerationEU (PIANO NAZIONALE DI RIPRESA E RESILIENZA (PNRR)—MISSIONE 4 COMPONENTE 2, INVESTIMENTO 1.4—D.D. 1032 17 June 2022, CN00000022) within the Agritech National Research Center.

Data Availability Statement

The datasets presented in this article are not readily available as they are part of an ongoing study. Requests to access the datasets should be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

VIs	Vegetation Indices
VIS	Visible
NIR	Near-Infrared
PCA	Principal Component Analysis
RFE	Recursive Feature Elimination
LOO-CV	Leave-One-Out Cross-Validation
RF	Random Forest
BIWE	Biologically Informed Wavelength Extraction (BIWE)
RFE-SVM	Recursive Feature Elimination with Support Vector Machines
OOB	Out-of-Bag
nm	Nanometer

References

Avola, G.; Matese, A.; Riggi, E. An overview of the special issue on “precision agriculture using hyperspectral images”. Remote Sens. 2023, 15, 1917. [Google Scholar] [CrossRef]
Wang, L.; Liu, D.; Pu, H.; Sun, D.W.; Gao, W.; Xiong, Z. Use of hyperspectral imaging to discriminate the variety and quality of rice. Food Anal. Methods 2015, 8, 515–523. [Google Scholar] [CrossRef]
Lowe, A.; Harrison, N.; French, A.P. Hyperspectral image analysis techniques for the detection and classification of the early onset of plant disease and stress. Plant Methods 2017, 13, 80. [Google Scholar] [CrossRef]
Avola, G.; Di Gennaro, S.F.; Cantini, C.; Riggi, E.; Muratore, F.; Tornambè, C.; Matese, A. Remotely sensed vegetation indices to discriminate field-grown olive cultivars. Remote Sens. 2019, 11, 1242. [Google Scholar] [CrossRef]
Liu, N.; Townsend, P.A.; Naber, M.R.; Bethke, P.C.; Hills, W.B.; Wang, Y. Hyperspectral imagery to monitor crop nutrient status within and across growing seasons. Remote Sens. Environ. 2021, 255, 112303. [Google Scholar] [CrossRef]
Kaushik, M.; Nidamanuri, R.R.; Aparna, B. Hyperspectral discrimination of vegetable crops grown under organic and conventional cultivation practices: A machine learning approach. Sci. Rep. 2025, 15, 7897. [Google Scholar] [CrossRef]
Zuur, A.F.; Ieno, E.N.; Elphick, C.S. A protocol for data exploration to avoid common statistical problems. Methods Ecol. Evol. 2010, 1, 3–14. [Google Scholar] [CrossRef]
Paoletti, M.E.; Haut, J.M.; Plaza, J.; Plaza, A. Deep learning classifiers for hyperspectral imaging: A review. ISPRS J. Photogramm. Remote Sens. 2019, 158, 279–317. [Google Scholar] [CrossRef]
Shuai, L.; Li, Z.; Chen, Z.; Luo, D.; Mu, J. A research review on deep learning combined with hyperspectral Imaging in multiscale agricultural sensing. Comput. Electron. Agric. 2024, 217, 108577. [Google Scholar] [CrossRef]
Bruce, L.M.; Koger, C.H.; Li, J. Dimensionality reduction of hyperspectral data using discrete wavelet transform feature extraction. IEEE Trans. Geosci. Remote Sens. 2002, 40, 2331–2338. [Google Scholar] [CrossRef]
Morales, G.; Sheppard, J.W.; Logan, R.D.; Shaw, J.A. Hyperspectral dimensionality reduction based on inter-band redundancy analysis and greedy spectral selection. Remote Sens. 2021, 13, 3649. [Google Scholar] [CrossRef]
Hughes, G. On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 1968, 14, 55–63. [Google Scholar] [CrossRef]
Benediktsson, J.A.; Sveinsson, J.R.; Amason, K. Classification and feature extraction of AVIRIS data. IEEE Trans. Geosci. Remote Sens. 1995, 33, 1194–1205. [Google Scholar] [CrossRef]
Al-Shalabi, L. New feature selection algorithm based on feature stability and correlation. IEEE Access 2022, 10, 4699–4713. [Google Scholar] [CrossRef]
Bajcsy, P.; Groves, P. Methodology for hyperspectral band selection. Photogramm. Eng. Remote Sens. 2004, 70, 793–802. [Google Scholar] [CrossRef]
Abdi, H.; Williams, L.J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
Hennessy, A.; Clarke, K.; Lewis, M. Hyperspectral Classification of Plants: A Review of Waveband Selection Generalisability. Remote Sens. 2020, 12, 113. [Google Scholar] [CrossRef]
Nwokoma, F.; Foreman, J.; Akujuobi, C.M. Effective data reduction using discriminative feature selection based on principal component analysis. Mach. Learn. Knowl. Extr. 2024, 6, 789–799. [Google Scholar] [CrossRef]
Menze, B.H.; Kelm, B.M.; Masuch, R.; Himmelreich, U.; Bachert, P.; Petrich, W.; Hamprecht, F.A. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinform. 2009, 10, 213. [Google Scholar] [CrossRef]
Spetale, F.E.; Bulacio, P.; Guillaume, S.; Murillo, J.; Tapia, E. A spectral envelope approach towards effective SVM-RFE on infrared data. Pattern Recognit. Lett. 2016, 71, 59–65. [Google Scholar] [CrossRef]
Rajeswari, S.; Suthendran, K. Feature Selection Method based on Fisher’s Exact Test for Agricultural Data. Int. J. Recent Technol. Eng. 2019, 8, 558–564. [Google Scholar] [CrossRef]
Joy, A.A.; Hasan, M.A.M.; Hossain, M.A. A comparison of supervised and unsupervised dimension reduction methods for hyperspectral image classification. In Proceedings of the 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), Cox’s Bazar, Bangladesh, 7–9 February 2019; pp. 1–6. [Google Scholar] [CrossRef]
Burkart, N.; Huber, M.F. A survey on the explainability of supervised machine learning. J. Artif. Intell. Res. 2021, 70, 245–317. [Google Scholar] [CrossRef]
Lipton, Z.C. The Mythos of Model Interpretability. arXiv 2017, arXiv:1606.03490. [Google Scholar] [CrossRef]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. Linear Regression. In An Introduction to Statistical Learning: With Applications in R; Springer: New York, NY, USA, 2013; pp. 59–126. [Google Scholar] [CrossRef]
König, G.; Günther, E.; von Luxburg, U. Disentangling interactions and dependencies in feature attribution. arXiv 2024, arXiv:2410.23772. [Google Scholar] [CrossRef]
Molnar, C.; König, G.; Bischl, B.; Casalicchio, G. Model-agnostic feature importance and effects with dependent features: A conditional subgroup approach. Data Min. Knowl. Discov. 2024, 38, 2903–2941. [Google Scholar] [CrossRef]
Goldwasser, J.; Hooker, G. Statistical Significance of Feature Importance Rankings. In Proceedings of the 41st Conference on Uncertainty in Artificial Intelligence, Barcelona, Spain, 25–28 July 2024. [Google Scholar]
Salimi, A.; Ziaii, M.; Amiri, A.; Zadeh, M.H.; Karimpouli, S.; Moradkhani, M. Using a Feature Subset Selection method and Support Vector Machine to address curse of dimensionality and redundancy in Hyperion hyperspectral data classification. Egypt. J. Remote Sens. Space Sci. 2018, 21, 27–36. [Google Scholar] [CrossRef]
Rasti, B.; Scheunders, P.; Ghamisi, P.; Licciardi, G.; Chanussot, J. Noise reduction in hyperspectral imagery: Overview and application. Rem. Sens. 2018, 10, 482. [Google Scholar] [CrossRef]
Molnar, C.; König, G.; Herbinger, J.; Freiesleben, T.; Dandl, S.; Scholbeck, C.A.; Casalicchio, G.; Grosse-Wentrup, M.; Bischl, B. General pitfalls of model-agnostic interpretation methods for machine learning models. In International Workshop on Extending Explainable AI Beyond Deep Models and Classifiers; Springer International Publishing: Cham, Switzerland, 2020; pp. 39–68. [Google Scholar] [CrossRef]
Dashti, A.; Müller-Maatsch, J.; Roetgerink, E.; Wijtten, M.; Weesepoel, Y.; Parastar, H.; Yazdanpanah, H. Comparison of a portable Vis-NIR hyperspectral imaging and a snapscan SWIR hyperspectral imaging for evaluation of meat authenticity. Food Chem. 2023, 18, 100667. [Google Scholar] [CrossRef] [PubMed]
Curran, P.J. Remote sensing of foliar chemistry. Remote Sens. Environ. 1989, 30, 271–278. [Google Scholar] [CrossRef]
Tallada, J.G.; Bato, P.M.; Shrestha, B.P.; Kobayashi, T.; Nagata, M. Quality evaluation of plant products. In Hyperspectral Imaging Technology in Food and Agriculture, 1st ed.; Park, B., Lu, R., Eds.; Springer: New York, NY, USA, 2015; pp. 240–241. [Google Scholar] [CrossRef]
Wei, X.; Liu, F.; Qiu, Z.; Shao, Y.; He, Y. Ripeness classification of astringent persimmon using hyperspectral imaging technique. Food Bioprocess Technol. 2014, 7, 1371–1380. [Google Scholar] [CrossRef]
Zhang, C.; Guo, C.; Liu, F.; Kong, W.; He, Y.; Lou, B. Hyperspectral imaging analysis for ripeness evaluation of strawberry with support vector machine. J. Food Eng. 2016, 179, 11–18. [Google Scholar] [CrossRef]
Fatchurrahman, D.; Nosrati, M.; Amodio, M.L.; Chaudhry, M.M.A.; de Chiara, M.L.V.; Mastrandrea, L.; Colelli, G. Comparison Performance of Visible-NIR and Near-Infrared Hyperspectral Imaging for Prediction of Nutritional Quality of Goji Berry (Lycium barbarum L.). Foods 2021, 10, 1676. [Google Scholar] [CrossRef] [PubMed]
Feilhauer, H.; Asner, G.P.; Martin, R.E. Multi-method ensemble selection of spectral bands related to leaf biochemistry. Remote Sens. Environ. 2015, 164, 57–65. [Google Scholar] [CrossRef]
Gomes, L.; Nobre, T.; Sousa, A.; Rei, F.; Guiomar, N. Hyperspectral reflectance as a basis to discriminate olive varieties—A tool for sustainable crop management. Sustainability 2020, 12, 3059. [Google Scholar] [CrossRef]
Bendini, A.; Cerretani, L.; Di Virgilio, F.; Belloni, P.; Bonoli-Carbognin, M.; Lercker, G. Preliminary evaluation of the application of the FTIR spectroscopy to control the geographic origin and quality of virgin olive oils. J. Food Qual. 2007, 30, 424–437. [Google Scholar] [CrossRef]
Gouvinhas, I.; De Almeida, J.M.; Carvalho, T.; Machado, N.; Barros, A.I. Discrimination and characterisation of extra virgin olive oils from three cultivars in different maturation stages using Fourier transform infrared spectroscopy in tandem with chemometrics. Food Chem. 2015, 174, 226–232. [Google Scholar] [CrossRef] [PubMed]
Gyftokostas, N.; Nanou, E.; Stefas, D.; Kokkinos, V.; Bouras, C.; Couris, S. Classification of Greek Olive Oils from Different Regions by Machine Learning-Aided Laser-Induced Breakdown Spectroscopy and Absorption Spectroscopy. Molecules 2021, 26, 1241. [Google Scholar] [CrossRef]
Kruzlicova, D.; Mocak, J.; Katsoyannos, E.; Lankmayr, E. Classification and characterization of olive oils by UV-vis absorption spectrometry and sensorial analysis. J. Food Nutr. Res. 2008, 47, 181–188. [Google Scholar]
Alamprese, C.; Grassi, S.; Tugnolo, A.; Casiraghi, E. Prediction of olive ripening degree combining image analysis and FT-NIR spectroscopy for virgin olive oil optimisation. Food Control 2021, 123, 107755. [Google Scholar] [CrossRef]
Zhang, C.X.; Zhang, J.S. Out-of-bag estimation of the optimal hyperparameter in SubBag ensemble method. Commun. Stat. Simul. Comput. 2010, 39, 1877–1892. [Google Scholar] [CrossRef]
Samb, M.L.; Camara, F.; Ndiaye, S.; Slimani, Y.; Esseghir, M.A. A novel RFE-SVM-based feature selection approach for classification. Int. J. Adv. Sci. Technol. 2012, 43, 27–36. [Google Scholar]
Song, S.; Gong, W.; Zhu, B.; Huang, X. Wavelength selection and spectral discrimination for paddy rice, with laboratory measurements of hyperspectral leaf reflectance. ISPRS J. Photogramm. Remote Sens. 2011, 66, 672–682. [Google Scholar] [CrossRef]
Merton, R. Monitoring community hysteresis using spectral shift analysis and the red-edge vegetation stress index. In Proceedings of the Seventh Annual JPL Airborne Earth Science Workshop, Pasadena, CA, USA, 12–16 January 1998; pp. 12–16. [Google Scholar]
Gamon, J.A.; Field, C.B.; Goulden, M.L.; Griffin, K.L.; Hartley, A.E.; Joel, G.; Penuelas, J.; Valentini, R. Relationships between NDVI, canopy structure, and photosynthesis in 3 Californian vegetation types. Ecol. Appl. 1995, 5, 28–41. [Google Scholar] [CrossRef]
Gitelson, A.A.; Merzlyak, M.N. Remote estimation of chlorophyll content in higher plant leaves. Int. J. Remote Sens. 1997, 18, 2691–2697. [Google Scholar] [CrossRef]
Lichtenthaler, H.K.; Lang, M.; Sowinska, M.; Heisel, F.; Miehe, J.A. Detection of vegetation stress via a new high resolution fluorescence imaging system. J. Plant Physiol. 1996, 148, 599–612. [Google Scholar] [CrossRef]
Peñuelas, J.; Baret, F.; Filella, I. Semi-imperical indices to assess carotenoids/chlorophyll, a ratio from leaf spectral reflectance. Photosynthetica 1995, 31, 221–230. [Google Scholar]
Peñuelas, J.; Gamon, J.A.; Fredeen, A.L.; Merino, J.; Field, C.B. Reflectance indexes associated with physiological-changes in nitrogen-limited and water-limited sunflower leaves. Remote Sens. Environ. 1994, 48, 135–146. [Google Scholar] [CrossRef]
Zhou, J.J.; Zhang, Y.H.; Han, Z.M.; Liu, X.Y.; Jian, Y.F.; Hu, C.G.; Dian, Y.Y. Evaluating the Performance of Hyperspectral Leaf Reflectance to Detect Water Stress and Estimation of Photosynthetic Capacities. Remote Sens. 2021, 13, 2160. [Google Scholar] [CrossRef]
Peñuelas, J.; Filella, I. Visible and near-infrared reflectance techniques for diagnosing plant physiological status. Trends Plant Sci. 1998, 3, 151–156. [Google Scholar] [CrossRef]
Chen, J.M. Evaluation of vegetation indices and a modified simple ratio for boreal applications. Can. J. Remote Sens. 1996, 22, 229–242. [Google Scholar] [CrossRef]
Peñuelas, J.; Pinol, J.; Ogaya, R.; Filella, I. Estimation of plant water concentration by the reflectance water index WI (r900/r970). Int. J. Remote Sens. 1997, 18, 2869–2875. [Google Scholar] [CrossRef]
Datt, B. Visible/near infrared reflectance and chlorophyll content in Eucalyptus leaves. Int. J. Remote Sens. 1999, 20, 2741–2759. [Google Scholar] [CrossRef]
Sims, D.A.; Gamon, J.A. Relationships between leaf pigment content and spectral reflectance across a wide range of species, leaf structures and developmental stages. Remote Sens. Environ. 2002, 81, 337–354. [Google Scholar] [CrossRef]
Thomas, J.; Gausman, H. Leaf reflectance vs. Leaf chlorophyll and carotenoid concentrations for eight crops. Agron. J. 1977, 69, 799–802. [Google Scholar] [CrossRef]
Jacquemoud, S.; Verhoef, W.; Baret, F.; Bacour, C.; Zarco-Tejada, P.J.; Asner, G.P.; François, C.; Ustin, S.L. PROSPECT+ SAIL models: A review of use for vegetation characterization. Remote Sens. Environ. 2009, 113, S56–S66. [Google Scholar] [CrossRef]
Kuzudisli, C.; Bakir-Gungor, B.; Bulut, N.; Qaqish, B.; Yousef, M. Review of feature selection approaches based on grouping of features. PeerJ 2023, 11, e15666. [Google Scholar] [CrossRef]
Fei, S.; Li, L.; Han, Z.; Chen, Z.; Xiao, Y. Combining novel feature selection strategy and hyperspectral vegetation indices to predict crop yield. Plant Methods 2022, 18, 119. [Google Scholar] [CrossRef] [PubMed]
Roberts, D.R.; Bahn, V.; Ciuti, S.; Boyce, M.S.; Elith, J.; Guillera-Arroita, G.; Hauenstein, S.; Lahoz-Monfort, J.J.; Schröder, B.; Thuiller, W.; et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 2017, 40, 913–929. [Google Scholar] [CrossRef]
Zahidi, U.A.; Łukasik, K.; Cielniak, G. Dual-band feature selection for maturity classification of specialty crops by hyperspectral imaging. arXiv 2024, arXiv:2405.09955. [Google Scholar] [CrossRef]
Doktor, D.; Lausch, A.; Spengler, D.; Thurner, M. Extraction of Plant Physiological Status from Hyperspectral Signatures Using Machine Learning Methods. Remote Sens. 2014, 6, 12247–12274. [Google Scholar] [CrossRef]

Figure 1. BIWE algorithm workflow.

Figure 2. Raw reflectance spectra recorded in cv. “Dolce d’Andria” according to the ripeness stage (a), reflectance spectra with 1st-order derivative (b) and 2nd-order derivative (c) preprocessing applied according to the ripeness stage.

Table 1. Cultivars, origin/diffusion, prevalent destination use.

No.	Cultivar	Origin/Diffusion	Use
1	Bella di Spagna	Apulia (Italy)	Fresh consumption
2	Bianchera	Friuli-Venezia G. (Italy)	Oil
3	Carboncella	Abruzzo (Italy)	Oil/Fresh consumption
4	Coratina	Apulia (Italy)	Oil
5	Dolce d’Andria	Umbria (Italy)	Oil/Fresh consumption
6	Farga	Valencia (Spain)	Oil
7	Frantoio	Tuscany (Italy)	Oil
8	II82	Umbria (Italy)	Oil
9	Intosso	Abruzzi (Italy)	Oil/Fresh consumption
10	Leccino	Tuscany (Italy)	Oil
11	Leccio del Corno	Tuscany (Italy)	Oil
12	Marzio	Tuscany (Italy)	Oil
13	Maurino	Tuscany (Italy)	Oil
14	Morchiaio	Tuscany (Italy)	Oil
15	Niedda	Sardinia (Italy)	Oil/Fresh consumption
16	Oblica	Croatia	Oil
17	Oblonga	USA	Oil
18	Oliva Rossa	Apulia (Italy)	Oil
19	Piangente	Tuscany (Italy)	Oil
20	Picholine	France	Oil/Fresh consumption
21	Raccioppella	Campania (Italy)	Oil/Fresh consumption
22	Razza	Lombardy (Italy)	Oil
23	Roggianella	Sardinia (Italy)	Oil/Fresh consumption
24	Rossellino	Tuscany (Italy)	Oil
25	Salegna	Molise (Italy)	Oil
26	Sargano di Fermo	Abruzzi (Italy)	Oil/Fresh consumption
27	Sari Hasebi	Türkiye	Oil
28	XVII87	Tuscany (Italy)	Oil
29	XXXVI	Tuscany (Italy)	Oil

Table 2. Olive maturity classes considered in the Surface Colorimetric Index (SCI).

Ripening Stage	Olive Skin Color
M1	100% green
M2	small reddish spots (<50% turning red, purple, or black)
M3	turning color (>50% turning red, purple, or black)
M4	100% purple or black

Table 3. Summary of the spectral indices used in this study.

Name	Simple Indices	Normalize Difference Indices	Others	Comments/Application	Reference
Normalized Difference Vegetation Index (NDVI)		$\frac{(R_{830} - R_{667})}{(R_{830} + R_{667})}$		chlorophyll content	[48]
Photochemical Reflectance Index (PRI)		$\frac{(R_{531} - R_{570})}{(R_{531} + R_{570})}$		efficiency of radiation and photosynthetic capacity	[49]
Gitelson and Merzlyak chlorophyll 1 (GM1) Gitelson and Merzlyak chlorophyll 1 (GM1)	$\frac{R_{750}}{R_{550}}$ $\frac{R_{750}}{R_{700}}$			chlorophyll content	[50]
Lichtenthaler Indices (LC1) Lichtenthaler Indices (LC2) Lichtenthaler Indices (LC3)	$\frac{R_{440}}{R_{690}}$ $\frac{R_{440}}{R_{740}}$	$\frac{(R_{800} - R_{680})}{(R_{800} + R_{+ 680})}$		chlorophyll content	[51]
Simple Ratio Pigment Index (SRPI)	$\frac{R_{430}}{R_{680}}$			fruit senescence; carotenoids and chlorophyll content	[52]
Normalized Pigment Chlorophyll Ratio Index (NPCI)		$\frac{(R_{680} - R_{430})}{(R_{680} + R_{430})}$		fruit senescence; pigments and chlorophyll content	[53]
Greenness Index (GI)	$\frac{R_{554}}{R_{677}}$			chlorophyll content	[54]
Structure Intensive Pigment Index (SIPI)			$\frac{(R_{445} - R_{800})}{(R_{680} + R_{800})}$	fruit senescence; carotenoids and chlorophyll content	[55]
Simple Ratio (SR)	$\frac{R_{774}}{R_{680}}$			chlorophyll content	[56]
Water Index (WI)	$\frac{R_{970}}{R_{900}}$			water status	[57]
Leaf Chlorophyll Index (LCI)			$\frac{(R_{850} - R_{710})}{(R_{850} + R_{680})}$	chlorophyll content	[58]
Chlorophyll Index_2 (SGB2)			$(R_{750} + R_{705}) (R_{750} + R_{705} - 2 \times R_{445})$	chlorophyll content	[59]
Chlorophyll Index_3 (SGB2)			$\frac{(R_{750} - R_{445})}{(R_{705} - R_{445})}$	chlorophyll content	[60]
R550			% reflectance at 550 nm	chlorophyll content	[60]
R650			% reflectance at 650 nm	chlorophyll content	[60]

Table 4. Selected wavelengths by the Discriminatory Index and Performance Metrics.

Wavelength	Derivative Type	Discriminatory Score	Quality Score	Weighted Detection Rate
680	D₂—peak	3.02	4.78	12.70
695	D₁—peak	2.62	4.42	10.60
705	D₂—valley	2.52	4.28	10.10
950	D₁—valley	2.40	2.35	9.80
655	D₁—valley	2.36	3.54	9.34
515	D₁—peak	2.36	3.89	9.33
970	D₂—peak	2.16	2.08	8.59
710	D₂—valley	2.02	3.57	7.73
550	Raw peak	1.99	3.11	7.60
650	D₁—valley	1.80	2.85	6.55
700	D₁—peak	1.78	2.72	6.62
690	D₁—peak	1.69	2.26	6.11
495	D₂—peak	1.69	2.71	5.95
810	Raw peak	1.66	2.20	5.84
500	D₂—peak	1.64	2.67	5.77
675	D₂—peak	1.62	1.84	5.99
510	D₁—peak	1.61	1.81	5.74
530	D₂—valley	1.53	2.41	5.21
555	Raw peak	1.48	1.14	5.03
650	D₂—valley	1.42	2.06	4.71
660	D₁—valley	1.41	1.45	4.82
575	D₁—valley	1.41	1.25	5.01
685	D₂—peak	1.37	1.98	4.77
815	Raw peak	1.28	1.49	3.98
720	D₂—valley	1.23	2.03	3.12

Each row represents a specific wavelength, its associated derivative type (D₁ = first derivative, D₂ = second derivative, Raw = original spectrum). The “Discriminatory Score” serves as the primary ranking metric, with higher values indicating greater utility in distinguishing between different sample categories. The “Quality Score” provides an overall assessment of the feature’s reliability and consistency; wavelengths with higher Quality Scores are generally more robust and less prone to noise or variability. “Weighted Detection Rate” quantifies the frequency or prevalence of the identified features across the analyzed sample set. A higher detection rate implies that the feature is consistently present and detectable, making it a reliable marker.

Table 5. Comparative Performance (Overall Accuracy, Average F1 scores and Technological Efficiency) of Feature Selection methodologies for Olive Classification Across Ripening Stages.

Models	Features	Bulk M1–M4	M1	M2	M3	M4	Features	Bulk M1–M4	M1	M2	M3	M4
		Without Vegetation Index						With Vegetation Index
Overall Accuracy
BIWE	25	0.563	0.679	0.662	0.655	0.686	43	0.671	0.691	0.702	0.716	0.705
RF	50	0.544	0.616	0.543	0.572	0.584	68	0.647	0.667	0.674	0.671	0.683
RFE	131	0.664	0.659	0.628	0.645	0.641	149	0.702	0.688	0.667	0.709	0.683
PCA	114	0.641	0.650	0.633	0.622	0.638	132	0.692	0.686	0.664	0.705	0.678
Average F1 scores
BIWE	25	0.558	0.676	0.659	0.656	0.686	43	0.669	0.688	0.699	0.714	0.702
RF	50	0.534	0.605	0.540	0.573	0.584	68	0.644	0.668	0.675	0.676	0.681
RFE	131	0.660	0.656	0.626	0.649	0.642	149	0.701	0.690	0.663	0.713	0.685
PCA	114	0.640	0.647	0.630	0.626	0.639	132	0.691	0.687	0.659	0.711	0.678
Technological Efficiency
BIWE	25	0.023	0.027	0.026	0.026	0.027	43	0.016	0.016	0.016	0.016	0.016
RF	50	0.011	0.012	0.011	0.011	0.012	68	0.010	0.010	0.010	0.010	0.010
RFE	131	0.005	0.005	0.005	0.005	0.005	149	0.005	0.005	0.005	0.005	0.005
PCA	114	0.006	0.006	0.006	0.005	0.006	132	0.005	0.005	0.005	0.005	0.005

Table 6. F1 performance matrix for cultivar discrimination across maturation stages M1–M4.

Cultivar	F1 Score
Cultivar	M1	M2	M3	M4
Bella di Spagna	0.940	0.919	0.964	0.788
Bianchera	0.621	0.615	0.658	0.602
Carboncella	0.181	0.216	0.382	0.435
Coratina	0.644	0.658	0.705	0.463
Dolce d’Andria	0.598	0.558	0.718	0.833
Farga	0.778	0.561	0.669	0.737
Frantoio	0.414	0.644	0.578	0.887
II-82	0.693	0.691	0.735	0.767
Intosso	0.772	0.517	0.499	0.515
Leccino	0.811	0.901	0.925	0.821
Leccio del Corno	0.688	0.671	0.732	0.616
Marzio	0.397	0.500	0.451	0.409
Maurino	0.925	0.948	0.933	0.803
Morchiaio	0.917	0.929	0.636	0.746
Niedda	0.828	0.862	0.762	0.776
Obliga	0.485	0.476	0.339	0.406
Oblonga	0.714	0.827	0.783	0.614
Oliva Rossa	0.704	0.616	0.695	0.613
Piangente	0.562	0.516	0.624	0.607
Picholine	0.504	0.558	0.830	0.624
Raccioppella	0.584	0.463	0.341	0.517
Raza	0.718	0.739	0.747	0.758
Roggianella	0.570	0.625	0.409	0.699
Rossellino	0.652	0.517	0.796	0.808
Salegna	0.589	0.594	0.524	0.718
Sargano di Fermo	0.643	0.612	0.608	0.494
Sari Hasebi	0.838	0.840	0.669	0.722
XVII-87	0.617	0.420	0.747	0.565
XXXVI	0.938	0.686	0.912	0.861

Table 7. Mean discriminatory score consistency from LOO-CV elaboration for 29 olive cultivars and 4 ripening stages analyzed.

Cultivar	Discriminatory Score
Cultivar	M1	M2	M3	M4
Bella di Spagna	0.718	0.583	0.664	0.558
Bianchera	0.587	0.598	0.608	0.513
Carboncella	0.543	0.566	0.617	0.485
Coratina	0.610	0.645	0.715	0.562
Dolce d’Andria	0.628	0.697	0.628	0.602
Farga	0.566	0.661	0.555	0.555
Frantoio	0.642	0.629	0.637	0.673
II-82	0.623	0.709	0.554	0.616
Intosso	0.577	0.584	0.620	0.540
Leccino	0.725	0.810	0.749	0.792
Leccio del Corno	0.622	0.674	0.634	0.631
Marzio	0.597	0.628	0.573	0.565
Maurino	0.774	1.049	0.930	0.629
Morchiaio	0.653	0.670	0.619	0.567
Niedda	0.577	0.626	0.622	0.595
Obliga	0.559	0.602	0.675	0.542
Oblonga	0.531	0.614	0.534	0.579
Oliva Rossa	0.579	0.584	0.544	0.586
Piangente	0.600	0.534	0.623	0.654
Picholine	0.657	0.585	0.654	0.530
Raccioppella	0.637	0.629	0.684	0.471
Raza	0.677	0.634	0.738	0.683
Roggianella	0.574	0.617	0.576	0.571
Rossellino	0.590	0.645	0.673	0.545
Salegna	0.660	0.686	0.623	0.663
Sargano di Fermo	0.570	0.553	0.630	0.483
Sari Hasebi	0.563	0.624	0.527	0.478
XVII-87	0.551	0.557	0.582	0.574
XXXVI	0.529	0.570	0.533	0.494

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Distefano, M.; Avola, G.; Cantini, C.; Gioli, B.; Cavaliere, A.; Riggi, E. A Biologically Informed Wavelength Extraction (BIWE) Method for Hyperspectral Classification of Olive Cultivars and Ripening Stages. Remote Sens. 2025, 17, 3277. https://doi.org/10.3390/rs17193277

AMA Style

Distefano M, Avola G, Cantini C, Gioli B, Cavaliere A, Riggi E. A Biologically Informed Wavelength Extraction (BIWE) Method for Hyperspectral Classification of Olive Cultivars and Ripening Stages. Remote Sensing. 2025; 17(19):3277. https://doi.org/10.3390/rs17193277

Chicago/Turabian Style

Distefano, Miriam, Giovanni Avola, Claudio Cantini, Beniamino Gioli, Alice Cavaliere, and Ezio Riggi. 2025. "A Biologically Informed Wavelength Extraction (BIWE) Method for Hyperspectral Classification of Olive Cultivars and Ripening Stages" Remote Sensing 17, no. 19: 3277. https://doi.org/10.3390/rs17193277

APA Style

Distefano, M., Avola, G., Cantini, C., Gioli, B., Cavaliere, A., & Riggi, E. (2025). A Biologically Informed Wavelength Extraction (BIWE) Method for Hyperspectral Classification of Olive Cultivars and Ripening Stages. Remote Sensing, 17(19), 3277. https://doi.org/10.3390/rs17193277

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Biologically Informed Wavelength Extraction (BIWE) Method for Hyperspectral Classification of Olive Cultivars and Ripening Stages

Abstract

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Plant Material and Olive Sampling

2.2. Instrumentation

2.3. Wavelength Selection Methodologies in Hyperspectral Analysis for Olive Classification

2.3.1. Biologically Informed Wavelength Extraction (BIWE)

Leave-One-Cultivar-Out Cross-Validation for BIWE Feature Validation

2.3.2. Random Forest

2.3.3. Recursive Feature Elimination with Support Vector Machine

2.3.4. Principal Component Analysis

2.4. Wavelength Selection Methodologies Comparative Analysis

3. Results

3.1. Spectral Characteristics of Different Maturation Stages

3.2. Wavelength Selection Methodologies in Hyperspectral Analysis for Olive Classification

3.2.1. Biologically Informed Wavelength Extraction

3.2.2. Random Forest Feature Selection

3.2.3. Recursive Feature Elimination with Support Vector Machine

3.2.4. Principal Component Analysis

3.3. Wavelength Selection Methodologies Comparative Analysis

3.4. LOO-CV Analysis on Drupes Classification with BIWE Method

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI