Cross-Learner Spectral Subset Optimisation: PLS–Ensemble Feature Selection with Weighted Borda Count for Grapevine Cultivar Discrimination

Loggenberg, Kyle; Strever, Albert; Münch, Zahn

doi:10.3390/geomatics6010012

Open AccessArticle

Cross-Learner Spectral Subset Optimisation: PLS–Ensemble Feature Selection with Weighted Borda Count for Grapevine Cultivar Discrimination

by

Kyle Loggenberg

^1,*

,

Albert Strever

²

and

Zahn Münch

¹

Department of Geography and Environmental Studies, Stellenbosch University, Private Bag X1, Matieland 7602, South Africa

²

South African Grape and Wine Research Institute, Stellenbosch University, Private Bag X1, Matieland 7602, South Africa

^*

Author to whom correspondence should be addressed.

Geomatics 2026, 6(1), 12; https://doi.org/10.3390/geomatics6010012

Submission received: 8 December 2025 / Revised: 18 January 2026 / Accepted: 23 January 2026 / Published: 28 January 2026

Download

Browse Figures

Review Reports Versions Notes

Abstract

The mapping of vineyard cultivars presents a substantial challenge in digital agriculture due to the crop’s high intra-class heterogeneity and low inter-class variability. High-dimensional spectral datasets, such as hyperspectral or spectrometry data, can overcome these difficulties. However, research has yet to fully address the need for optimal spectral feature subsets tailored for grapevine cultivar discrimination, while few studies have systematically examined waveband subsets that transfer effectively across different learning algorithms. This study sets out to address these gaps by introducing a Partial Least Squares (PLS)-based ensemble feature selection framework with Weighted Borda Count aggregation for cultivar discrimination. Using in-field spectrometry data, collected for six cultivars, and 18 PLS-based feature selection methods spanning filter, wrapper, and hybrid approaches, the PLS–ensemble identified 100 wavebands most relevant for cultivar discrimination, reducing dimensionality by ~95%. The efficacy and transferability of this subset were evaluated using five classification algorithms: Oblique Random Forest (oRF), Multinomial Logistic Regression (Multinom), Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), and a 1D Convolutional Neural Network (CNN). For oRF, Multinom, SVM, and MLP, the PLS–ensemble subset improved accuracy by 0.3–12% compared with using all wavebands. The subset was not optimal for the 1D-CNN, where accuracy decreased by up to 5.7%. Additionally, this study investigated waveband binning to transform narrow hyperspectral bands into broadband spectral features. Using feature multicollinearity and wavelength position, the 100 selected wavebands were condensed into 10 broadband features, which improved accuracy over both the full dataset and the original subset, delivering gains of 4.5–19.1%. The SVM model with this 10-feature subset outperformed all other models (F1: 1.00; BACC: 0.98; MCC: 0.78; AUC: 0.95).

Keywords:

ensemble feature selection; machine learning; deep learning; hyperspectral data; cultivar discrimination; Weighted Borda Count; precision viticulture

1. Introduction

Accurate mapping of grapevine cultivars is critical for yield estimation, sustainable farming, and crop inventory management [1,2,3,4]. This task is challenging due to high intra-class spectral variability, low inter-class differences, and additional spatial and temporal variability arising from soil conditions, microclimate, management practices, and phenological stage [1,2,5,6]. These challenges are especially pronounced when using traditional broadband multispectral data.

Hyperspectral and spectrometry datasets provide numerous narrow, contiguous wavebands that capture subtle biochemical and structural differences between cultivars, improving discrimination potential [7,8]. However, these high-dimensional datasets often contain many redundant or weakly informative features, which, combined with limited training samples, can lead to the “curse of dimensionality” or Hughes phenomenon [9,10]. This reduces classifier generalisability and increases computational and interpretational complexity, highlighting the importance of identifying the most informative spectral features to enhance model performance [11,12]. Yet, despite the widespread use of hyperspectral data in precision viticulture [4,13,14,15], a notable research gap in the literature remains, namely, a lack of focused investigation into the identification and evaluation of optimal spectral features for grapevine cultivar discrimination.

Feature selection (FS) methods are commonly applied to reduce dimensionality by identifying optimal subsets of features that maximise relevance while minimising redundancy [4,9]. The body of research on FS methods points towards three commonalities. Firstly, wrapper methods—i.e., FS approaches dependent on the feedback from a given learner—have been found to outperform filter methods that function independently of the learner tasked for classification [16,17,18,19]. Secondly, the majority of the literature within the broader domain of digital agriculture has focused on three key areas: (1) the application of a single FS algorithm [20,21,22,23], (2) comparisons between FS approaches [1,4,11,12,16,18,19], or (3) the development of novel FS techniques [17,24,25,26]. Lastly, FS methods are rarely optimal across different learners. This assertion is corroborated by the investigation of He et al. [19], who examined how the combination of the FS method and classifier affects crop classification results. Their study revealed that the effectiveness of a given FS subset for crop mapping is strongly dependent on the classification algorithm employed. Similar observations were noted by Imran et al. [12] and Raja et al. [11].

Numerous FS techniques have been reported in the literature. These include ranking-based approaches, such as Gini impurity and permutation importance [1,19]; tree-based methods, such as Boruta [11,20]; stepwise elimination techniques, including Recursive Feature Elimination (RFE) [11,19,23] and Sequential Forward Selection (SFS) [16,17]; univariate approaches, such as ANOVA [4]; and projection-based methods, like Principal Component Analysis (PCA) and Partial Least Squares (PLS) [1,4,18,22]. In addition, hybrid methods [17,27] and ensemble approaches [12,24] have been proposed to enhance FS robustness by combining multiple strategies or criteria for subset selection. Ensemble methods are widely regarded as a benchmark for classification tasks in precision viticulture [28,29], and similar principles have been adopted for FS research [12,30]. Ensemble FS approaches are constructed from either identical algorithms applied in different ways (homogeneous) or entirely different algorithms (heterogeneous), with the outputs of these algorithms aggregated through voting techniques, such as relative majority or Borda Count—a method originating from social choice theory, not commonly applied in crop mapping—to determine the optimal feature subset [12,30]. Ensemble FS is increasingly recognised for improving stability, reducing biases, and mitigating spurious correlations often associated with single-algorithm FS approaches [30,31]. These methods are particularly valuable in cultivar discrimination, where high spectral similarity amplifies instability and bias in single-method FS approaches. Against this background, the present study set out to systematically identify the most relevant wavebands for grapevine cultivar classification and to evaluate their robustness across various classification algorithms. Partial Least Squares (PLS) was adopted as the primary analytical framework for feature selection due to its proven effectiveness with high-dimensional spectral data, as evidenced by its successful application in prior studies [4,32,33]. Mirzaei et al. [4] presented one of the few studies specifically dedicated to identifying optimal wavebands for cultivar discrimination. Utilising in-field spectrometry to classify five cultivars, their study found that wavebands selected by PLS in the visible (438–466 nm, 527–573 nm, and 621–636 nm) and shortwave infrared (1379 nm and 2292 nm) regions of the spectrum were most informative, achieving overall accuracies between 89.9% and 100%. Building on this foundation laid by Mirzaei et al. [4], the current study presents a first attempt at utilising a PLS-based ensemble FS framework for cultivar discrimination. While various univariate and multivariate PLS approaches are applied, the focus here is not on algorithmic innovation of these methods themselves or comparative evaluation of their performance. Rather, PLS serves as a consistent foundational framework for an ensemble architecture through which multiple feature selection approaches are applied to identify the optimal spectral wavebands for cultivar classification. This study’s contribution is, therefore, problem-driven and lies in the systematic methodological integration of these established techniques to tackle the previously underexplored question of cross-learner spectral subset robustness.

Accordingly, this research is structured around the following objectives: (i) identify a subset of the most relevant wavebands for cultivar discrimination using in-field spectrometry; (ii) evaluate the efficacy of PLS-based ensemble feature selection with Weighted Borda Count aggregation; and (iii) assess the transferability of PLS–ensemble-selected wavebands across classification algorithms. Finally, this study examines the use of waveband binning to transform narrow hyperspectral bands into broadband spectral features. This step holds practical importance for large-scale cultivar mapping, as broadband sensors are generally more cost-effective and operationally feasible than hyperspectral systems. To our knowledge, this represents the first integration of a PLS-based ensemble FS framework with Weighted Borda Count aggregation for grapevine cultivar discrimination.

2. Materials and Methods

This section presents the experimental design, including the study area and in-field spectrometry data collection, and describes the construction of the PLS–ensemble feature selection framework and the aggregation of spectral wavebands. Finally, the procedures for assessing the performance and transferability of the selected waveband subsets across multiple classification algorithms are outlined. Performance evaluations compared both the PLS–ensemble and aggregated spectral feature subsets with the full dataset using five classifiers: Oblique Random Forest (oRF), Multinomial Logistic Regression (Multinom), Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), and a one-dimensional (1D) Convolutional Neural Network (CNN), with performance measured across multiple accuracy metrics (see Section 2.4).

2.1. Experimental Design

This study utilised high-dimensional spectral data in the form of in-field spectrometry. Spectral signatures were acquired for six grapevine cultivars commonly farmed for raisin production: Currants, Merbein Seedless, Diamond Muscat, Selma Pete, Sugra-39, and Sultana. The field campaign was carried out between 12 and 14 December 2025—during the fruit set phenological stage, when vine growth transitions from flowering to berry development and vine photosynthetic activity increases [34,35]—on the Welgevallen experimental farm in Stellenbosch (central coordinate: 33°56′38.5″ S, 18°52′06.8″ E), located in South Africa’s Western Cape Province. The region is renowned for its Mediterranean climate, mountainous valleys, and cultivation of wine grapes [36,37]. The collection of spectral samples was carried out using the Fieldspec-4 Std-Res spectroradiometer (ASD Inc., Denver, CO, USA), which acquires spectral measurements across the 350 to 2500 nm wavebands, at a spectral resolution between 3 nm and 10 nm. Canopy-level, rather than leaf-level, samples were used to incorporate the spatial variability within the canopy into the measurements. This approach serves as a practical proxy for remote sensing (RS) imagery as it mimics the “mixed pixel” effect commonly observed in RS imagery. For each sample, 50 readings were averaged, yielding 189 canopy samples across the six cultivars. Atmospheric water absorption bands (350–399 nm, 1350–1425 nm, 1825–1925 nm, and 2451–2500 nm) were excluded, leaving 1 874 wavebands for analysis.

The analysis was conducted in the R statistical software environment version 4.4.1 (R Development Core Team, 2024) and in Google’s Colaboratory (Colab) environment. The PLS–ensemble, as well as the oRF, Multinom, and SVM models, were implemented in R, whereas the MLP and 1D CNN models were executed in Python (version 3.12.12) via Google Colab. All classification models, including PLS, were trained and tested using a 70/30 data split, stratified by cultivar, and evaluated with 10-fold cross-validation. All feature selection procedures, including PLS–ensemble construction and Borda Count Weighting, were performed exclusively on the training data. Default hyperparameters were applied to establish a standardised baseline for subset comparison. Using default hyperparameters enabled the analysis to isolate the effects of FS on classifier performance, ensuring that changes in classification metrics could be attributed to data configuration rather than changes in hyperparameter values.

The overall experimental workflow of this study is summarised in Figure 1. Detailed methodological descriptions of the feature selection methods (Section 2.2.1, Section 2.2.2, Section 2.2.3, Section 2.2.4), PLS–ensemble construction (Section 2.2.5), waveband binning (Section 2.3), and cross-learner evaluation (Section 2.4) are provided in the subsequent sections to ensure reproducibility.

2.2. Construction of PLS–Ensemble

The PLS-based ensemble feature selection framework was constructed by integrating 18 filter, wrapper, and hybrid methods to identify the most informative spectral features. PLS, or specifically partial least-squares regression (PLSR), is a deterministic model that has been successfully employed for high-dimensional data in a myriad of studies [4,38,39,40]. The foundations of the algorithm were established by the seminal works of Wold et al. [41], Martens [42], and Helland [43]. Their research proposed a regression approach that aims to construct latent variables or components as linear combinations of wavebands that maximise covariance with the response variable. The PLS method produces several uncorrelated components that summarise the majority of the wavebands’ variance [40], while simultaneously reducing the influence of non-informative or weakly relevant wavebands by assigning them lower weights in the component construction [44]. This makes PLS ideally suited for dimensionality reduction tasks. Mathematically, given an output of

Y \in R^{N \times p}

and input values of

X \in R^{N \times m}

, the PLS linear regression problem can be expressed as

Y = X β + E

(1)

where

X \in R^{N \times m}

is the predictor matrix defined by

N

samples and

m

number of predictors,

Y \in R^{N \times p}

is the response matrix, where

p

signifies the number of response measurements,

β

is the regression coefficient matrix linking predictors

X

to responses

Y

, and E captures the residual or unexplained variance. The decomposition of the predictor matrix

X

and response matrix

Y

into latent variables is given by Equation (2):

X = T U + E; Y = T Q + E

(2)

T = X W

(3)

where

T

is the matrix of latent scores that summarises the predictor information most relevant to predicting

Y

.

T

is constructed as a linear combination of the original predictors

X

multiplied by a weight matrix

W

(Equation (3)), chosen such that

T

is maximally correlated with

Y

.

U

and

Q

(Equations (2) and (4)) are the loadings for

X

and

Y

, respectively, and define how the latent scores relate to

X

and

Y

. The expression for the regression coefficients

β

in terms of the weight matrix

W

, the predictor loadings

U

, and the response loadings

Q

is given in Equation (4).

β = {W (U^{T} W)}^{- 1} Q^{T}

(4)

This formulation ensures that the regression captures directions in

X

that are most predictive of

Y

, while simultaneously reducing dimensionality and alleviating multicollinearity. In this study, PLS was implemented for both wrapper and filter approaches using the softmax function to assign class labels, with a default number of components (default = 10). The following subsections provide a detailed description of the 18 PLS-based methods, organised according to their underlying selection strategy, i.e., filter-based, wrapper-based, multicriteria, and hybrid approaches.

2.2.1. Filter-Based Feature Selection

In a filter-based approach, a PLS model is first built, and the output from the PLS algorithm (e.g., regression coefficients, variable importance scores, or loading weights) is used to rank wavebands according to their relative contribution in explaining the response variable. The final subset is then selected as a portion of the highest-ranked wavebands, typically based on a user-defined threshold. For all filter methods used, a threshold of 10% was set. The current study employed six PLS filters constructed using the plsVarSel R package (0.9.12) [45]:

Variable importance in projection (VIP): First proposed by Wold et al. [46], VIP represents a measure of how much a waveband contributes to describing both the predictors $X$ and responses $Y$ in the PLS model [47,48]. Typically, a VIP value < 1 indicates a non-important variable. Mathematically, the VIP score for waveband $j$ is defined as

{V I P}_{j} = \sqrt{\frac{\sum_{f = 1}^{F} W_{j f}^{2} \times {S S Y}_{f} \times J}{{S S Y}_{t o t a l} \times F}}

(5)

where

W_{j f}^{2}

is the element of the weight matrix

W

for waveband

j

in component

f

,

{S S Y}_{f}

defines the amount of variation in responses

Y

explained by component

f

,

J

is the number of input wavebands, and

{S S Y}_{t o t a l}

is the variation of responses

Y

explained by the total number of components represented by

F

.

Selectivity ratio (SR): Based on the target projection (TP) of loadings, the SR assesses the discriminative power of each waveband by comparing explained variance with residual variance. TP projects the original data onto a single predictive component that captures the part of the variation in predictors $X$ most strongly related to responses $Y$ [47,49]. This separates the variation into two parts: the “signal” (explained variance, $V_{\exp j}$ ) and the “noise” (residual variance, $V_{r e s j}$ ). The SR, therefore, reflects the signal-to-noise contribution of each waveband in the regression model by defining a ratio, $r_{j}$ , between $V_{\exp j}$ and $V_{r e s j}$ for waveband $j$ , following

r_{j} = \frac{V_{\exp j}}{V_{r e s j}}

(6)

where a higher

r_{j}

indicates that a waveband contributes more meaningful information relative to noise in the regression model [47,49].

Significance multivariate correlation (sMC): Unlike VIP and SR, sMC assesses a waveband’s statistical significance with respect to its relationship to responses $Y$ , rather than its relative importance [47]. For a waveband $j$ , sMC compares the $V_{\exp j}$ with $V_{r e s j}$ , adjusted for degrees of freedom, using an F-type statistic. A higher sMC value indicates that the waveband’s correlation with the responses $Y$ is stronger than what would be expected by chance or random noise.
Loading weights (LW): For each latent component constructed by the PLS model, the loadings of predictors $X$ can serve as a measure of variable importance. Wavebands with a higher absolute LW contribute more strongly to the component. A subset of wavebands can then be selected based on a user-defined threshold.
Regression coefficients (RC): Similar to LW, the PLS model’s calculated regression coefficients $(β)$ serve as a measure of variable importance. Wavebands with larger absolute $β$ values contribute more strongly to predicting the responses $Y$ , and a subset can be selected based on a user-defined threshold.
Peak loadings: This common application of PLS feature selection [4,33] relies on the inspection of PLS loading plots. Peaks (either positive or negative) in the loading curve correspond to wavelengths that strongly influence the model, while values near zero indicate little contribution. Feature selection is then performed by choosing wavebands around these peaks, which differs from the LW approach that quantifies how much each variable contributes to the construction of the latent components.

2.2.2. Wrapper-Based Feature Selection

Although the PLS algorithm has predominantly been used as a filter method, it has successfully been implemented as a wrapper as well [38,39,48,50]. This study, therefore, incorporated wrapper-based approaches in the FS ensemble methodology. Three PLS wrapper methods were employed:

Backward variable elimination (BVE): This approach commonly utilises one of the previously described filter methods to rank wavebands. A user-defined threshold is then applied to determine the optimal subset size, after which a PLS model is refitted to evaluate subset performance. This process is repeated until the maximum number of iterations is reached or maximum model performance is observed [47]. BVE was implemented using the plsVarSel (0.9.12) R package and VIP for feature ranking [45].
Interval PLS (iPLS): Interval-based PLS was first introduced by Nørgaard et al. [51], and splits the input wavebands into equal, non-overlapping intervals and fits a local PLS model in each interval. Backward elimination is employed to iteratively remove the worst-performing interval relative to a PLS model fitted to all wavebands in the available intervals [39]. The process iterates until no further improvement is observed or a maximum number of iterations is reached [52]. iPLS has been recommended for highly correlated spectral datasets because it evaluates groups of adjacent wavebands collectively, reducing the impact of multicollinearity [48,51]. The mdatools (0.14.2) [52] package in R was used to construct the iPLS model.
Overlapping iPLS: A variant of iPLS, often called moving window or sliding window iPLS, divides the spectral range into overlapping intervals, allowing features spanning interval boundaries to be evaluated more continuously [39]. In this study, the original iPLS model was modified to use intervals of 100 wavebands with a step size of 50 wavebands (i.e., each interval overlapped the previous by 50 wavebands).

2.2.3. Multicriteria Evaluation (Hybrid Filter Approach)

A multicriteria evaluation (MCE) approach was applied to derive three hybrid filters from the results of the five PLS-based feature selection techniques (VIP, SR, sMC, LW, and RC). The first approach (Union) aggregates all wavebands selected by any of the five methods. The second (Overlapping ≥ 2) retains only wavebands identified by two or more methods, providing a moderately conservative subset. The third (Overlapping ≥ 3) is more stringent, keeping only wavebands consistently selected by three or more methods. These approaches aimed to balance inclusiveness and robustness by combining multiple selection criteria.

2.2.4. Combined Approaches

Lastly, the MCE approaches were integrated with the iPLS methods to leverage both interval-level robustness and filter-level consensus. In this combined framework, MCE evaluation was applied to wavebands preselected by iPLS, using both non-overlapping and overlapping interval schemes. The iPLS + MCE approach led to the construction of six additional subsets: (1) iPLS + MCE (Union), where MCE is applied to iPLS-selected wavebands and the final subset is generated by aggregating all wavebands selected across the five filter methods; (2) iPLS + MCE Overlapping ≥ 2, retaining only wavebands identified by two or more filter methods among the iPLS-selected subset; and (3) iPLS + MCE Overlapping ≥ 3, keeping only wavebands consistently selected by three or more methods within the iPLS subset. A parallel set of three variants was applied to overlapping iPLS: (4) iPLS Overlapping + MCE (Union); (5) iPLS Overlapping + MCE Overlapping ≥ 2; and (6) iPLS Overlapping + MCE Overlapping ≥ 3.

2.2.5. Final PLS–Ensemble

The 18 FS methods were all iterated 10 times, and the frequency with which specific wavebands were selected was recorded. In total, a single waveband could be selected up to 180 times (10 iterations × 18 methods). These frequency scores were then aggregated based on a Weighted Borda Count approach. Borda Count is a popular voting approach [30,53,54] known for its simplicity in aggregation tasks [53]. The method works by ranking items (in this case, wavebands) and then assigning them points based on their rank. In our study, the Borda Count was applied as follows:

For each FS method, the selected wavebands were ranked by selection frequency, with higher frequency values receiving higher ranks.
Each rank was then converted into Borda points using

{B P}_{i j} = s_{j} - r_{i j} + 1

(7)

where

s_{j}

is the number of wavebands selected by the FS method

j

, and

r_{i j}

is the rank of waveband

i

within the FS method

j

. Thus, the top-ranked waveband receives

s_{j}

points, while the lowest-ranked receives 1 point.

3.: The Borda points for each waveband were then summed across all 18 FS methods to obtain a consensus score:

B_{i} = \sum_{j = 1}^{M = 18} {B P}_{i j}

(8)

A weighting scheme was introduced following the recommendations of Drotár et al. [30]. Since FS methods can differ in reliability and performance, those that produced subsets leading to better classifier performance were considered more trustworthy and, therefore, assigned greater weights in the Borda rankings. The sum of ranking differences (SRD) method was employed to determine these Borda weights. SRD was selected as the weighting scheme for its simplicity and robustness against outliers and scale differences [55,56]. SRD operates by

Taking a matrix comprised of FS subsets evaluated across classifiers and performance metrics as input, with each FS subset then ranked according to its performance.
The absolute differences between individual subset rankings (i.e., the SRD values) are then calculated and summed into SRD scores.
The aggregated SRD scores are then min–max-normalised to produce a weight vector, $w_{j}$ , which is then applied to the Borda scores from each FS method before the final aggregation (given by Equation (9)), yielding a weighted ensemble ranking of wavebands.

B_{i} = \sum_{j = 1}^{M = 18} {w_{j} B P}_{i j}

(9)

SRD weights were calculated using training data only, with methods yielding more informative subsets assigned greater influence in the Borda aggregation, ensuring a performance-driven ensemble. The final subset was then selected based on these Weighted Borda Count rankings and an optimal threshold. To determine the optimal threshold, the elbow method was first applied to the Weighted Borda Count rankings to identify the point of maximum curvature, which serves as an adaptive threshold for feature selection. This ensures that highly informative wavebands are preferentially selected based on the underlying data structure. Thereafter, a percentile-based rule (top 15%, optimised through iterative testing) was applied to refine the selection. By combining the elbow and percentile methods, the thresholding approach strikes a balance between data-driven cut-off detection and safeguarding against selecting too many or too few features.

2.3. Aggregation of Spectral Wavebands

Spectral binning was explored to reduce feature redundancy, improve interpretability, and enhance the practicality of large-scale cultivar mapping. Firstly, highly collinear wavebands were identified using hierarchical clustering of a Pearson correlation (r) matrix, with clusters defined at |r| ≥ 0.85 (optimised through iterative testing) to group bands carrying redundant information. Secondly, adjacent wavebands within 10 nm were merged into continuous intervals, ensuring that non-contiguous regions were treated separately. These correlation- and adjacency-based groupings were then combined to form unique bins, such that adjacent, highly correlated wavebands were assigned to the same group. Finally, reflectance values within each bin were averaged, producing robust broadband features that preserved essential spectral information while minimising multicollinearity.

2.4. Assessment of Waveband Subset

The PLS–ensemble waveband subset and the aggregated waveband subset were evaluated across five classification algorithms, detailed below. Following the recommendations of Varoquaux and Colliot [57] and Opitz [58], who asserted that no single accuracy metric is optimal due to the loss of information when aggregating confusion matrix values, four complementary evaluation metrics were employed: the F1-score (F1), balanced accuracy (BACC), Matthews correlation coefficient (MCC), and the area under the receiver operating characteristic curve (AUC-ROC or AUC). The dataset exhibits moderate class imbalance, where some cultivars are represented by more samples than others. These metrics were chosen to provide a balanced assessment of discriminative ability, class imbalance handling, and overall predictive reliability of the classification algorithms and datasets.

Additionally, the impact of feature selection on inter-class cultivar separability was measured using the Spectral Angle Mapper (SAM) method. For each dataset—the full dataset (p = 1874), the PLS–ensemble subset (p = 100), and the aggregated subset (p = 10)—mean spectral vectors were computed per cultivar. Pairwise SAM distances were then calculated between all unique cultivar combinations to quantify the angular differences between their spectral signatures, with larger SAM values (measured in radians) indicating greater class separability. The resulting pairwise distances were visualised using density probability plots to characterise the distribution of inter-class spectral separability, where the curve height reflects the density of pairwise distances and horizontal shifts describe changes in the distance distribution.

2.4.1. Oblique Random Forest (oRF)

oRF is a decision tree ensemble that employs oblique splits, which enables more flexible decision boundaries in high-dimensional space [59,60]. During model construction, each tree is trained on a bootstrap sample (bagging) drawn with replacement from the training dataset, ensuring diversity among trees. At each node, a random subset of predictors of size mtry is selected and combined to determine optimal node splitting. See [59] for a detailed account of oRF. The oRF models were implemented using the aorsf (0.1.5) package in R [61]. The models were built with 500 trees (ntree) and the default mtry (

⌈\sqrt{p}⌉

) value, where

p

is the number of predictors.

2.4.2. Multinomial Logistic Regression (Multinom)

An extension of binary logistic regression to multiclass problems, Multinom implements the ‘softmax’ function to transform logits, a linear combination of predictor variables, into class probabilities. The package R caret (6.0-94) [62] was used for model construction. A full methodological report on logistic regression is provided by [63].

2.4.3. Support Vector Machine (SVM)

SVM is a linear classifier that constructs an optimal separating hyperplane by maximising the margin—the distance between the hyperplane and the closest training samples (support vectors)—in transformed feature space. The SVM model employs a kernel function that maps input data onto a higher-dimensional feature space to determine class separation (refer to [64] for further algorithm description). This study employed the linear kernel function, which constructs the separating hyperplane without performing a non-linear transformation, implemented using the e1071 (1.7-14) R package [65].

2.4.4. Multi-Layer Perceptron (MLP)

MLP is a multilayer neural network that leverages feedforward propagation to compute class probabilities and backpropagation for model optimisation [66]. The deep learning model was constructed with four fully connected and ReLU-activated hidden layers. The model was compiled using the Adam optimiser and categorical cross-entropy loss and trained for 100 epochs with a batch size of 16 samples. Final class predictions were determined using the softmax activation function in the output layer. The keras Pyhon application programming interface (API) from tensorflow [67] was used in a Google Colab environment to build and train the MLP model.

2.4.5. One-Dimensional Convolutional Neural Network (1D CNN)

A deep learning architecture was employed that applies convolutional filters (or kernels) along the spectral dimension to learn local waveband dependencies, i.e., rather than evaluating each wavelength independently, the model captures patterns across neighbouring wavelengths. In this study, the model utilised a 1D kernel and input data structured as a two-dimensional matrix (samples × wavebands). Each convolutional filter traverses the waveband axis to produce feature maps representing learned spectral patterns [68]. These feature maps are subsequently processed by fully connected layers to output class probability distributions for each sample using the softmax function. Utilising the keras API from tensorflow [67], the 1D CNN was constructed following the ResNet-34 architecture, for which details are provided by [19]. The model used a kernel size of 3 and a leaky rectified linear unit (LeakyReLU) activation function. As with the MLP model, training employed the Adam optimiser and categorical cross-entropy loss function, running for 100 epochs with a batch size of 16.

3. Results and Discussion

3.1. Assessment of the PLS–Ensemble and Aggregated Subsets

The PLS–ensemble framework identified 100 wavebands as optimal (see Table 1), representing approximately a 95% reduction in dimensionality. The stability of waveband selection across the 18 PLS-based feature selection methods was quantified using selection frequency (Figure 2). The frequency distribution indicates that a core set of wavebands was consistently selected across the majority of FS methods, providing direct evidence of the PLS–ensemble strategy’s selection stability. Certain wavebands, particularly in the 508–566 nm region, exhibited lower individual selection frequencies. These wavebands were retained through Weighted Borda Count aggregation, which integrates ranking position with classifier-performance-weighted consensus. The inclusion of these wavebands captures complementary spectral information identified by fewer methods, enabling the ensemble to balance stability and diversity without compromising robustness.

The wavebands were selected across different regions of the electromagnetic (EM) spectrum, with wavebands in the red and near-infrared (NIR) portion, spanning from 670 to 742 nm, being the most densely sampled, with 61 selected wavebands. The red–NIR region, inclusive of the red-edge (typically comprising 680–780 nm), shows nearly contiguous sampling, suggesting it is particularly important for discriminating grapevine cultivars. This spectral region is commonly used in agricultural monitoring due to its sensitivity to chlorophyll content and plant health [1,2,4,23]. Hennessy et al. [69], in their review of hyperspectral waveband selection in the broader context of species and crop classification, specifically highlighted the 680 nm waveband as a critical band for crop type discrimination. The 680 nm waveband was also present in the PLS–ensemble subset, with many adjacent wavebands also selected. This may suggest that the PLS–ensemble may have failed to exclude many multicollinear wavebands but may also emphasise the importance of red-edge wavebands for the discrimination of spectrally similar cultivars.

The selection of red–NIR wavebands can be attributed to two main factors: (1) the red region’s (600–679 nm) sensitivity to chlorophyll a and b absorption, together with the red-edge/NIR (680–742 nm) transition zone’s sensitivity to scattering associated with leaf cellular structure [69,70,71], and (2) canopy-level spectra in this region are particularly responsive to differences in cultivar architecture, leaf thickness, and internal structure, thereby providing discriminative information. This latter point is corroborated by Hennessy et al. [69], who found that canopy-level studies more frequently selected wavebands in this region compared with leaf-level studies. The majority selection of the red–NIR wavebands is validated by the findings of Mirzaei et al. [4], who also reported similar wavebands in their feature selection study of cultivar classification.

Notably, no wavebands were selected across the blue region (400–500 nm) or near the NIR plateau (800–1300 nm). This contradicts assertions made by Karakizi et al. [3], who suggested that spectral features across 760–1050 nm were crucial wavelengths for varietal detection, as well as Mirzaei et al. [4], who identified both blue and NIR plateau wavebands as highly relevant for cultivar discrimination. The strong absorption of blue reflectance by chlorophyll and carotenoids [69,70,71] may cause the blue wavebands to be less informative for discriminating between cultivars compared with the red-edge/NIR region. At the canopy level, the ineffectiveness of blue wavebands is further compounded as reflectance variations are often dominated by leaf orientation and other structural effects. Additionally, as samples were collected early in the season, the higher chlorophyll content associated with earlier phenological stages could further reduce variability in the blue wavebands. In contrast, Mirzaei et al. [4] used samples collected later in the season, when chlorophyll levels may have declined, resulting in higher reflectance and greater variation in the blue region. Similarly, NIR plateau wavebands have been reported to be better suited for detecting interspecific variability (i.e., discriminating between species) rather than intraspecific variation (i.e., differences between cultivars), as structural variations pertinent for canopy-level cultivar discrimination are captured more strongly in the red-edge/NIR transition zone [69,72,73].

The yellow and green region (508–566 nm) accounted for the second most selected wavebands with 21. These wavebands are situated near the so-called ‘green hump’ (550 nm), which is associated with peak reflectance in the visible spectrum predominantly due to low chlorophyll absorption [69,71]. Differences in chlorophyll and carotenoid concentrations, leaf pigment ratios, and leaf surface characteristics (i.e., glossiness, waxiness, trichomes or hairs, cuticle thickness, and texture) can lead to subtle variations in the reflectance signatures of cultivars across these wavebands. Subsequently, this spectral variability provides classifiers with additional discriminatory features, leading to better classification of cultivars [71,74].

Moreover, the 21 yellow–green wavebands selected are commonly used in spectral indices for plant analysis [4,69,71], supporting the validity of their selection by the PLS–ensemble. For example, the photochemical reflectance index (PRI), which is strongly correlated with leaf pigment ratios, is typically constructed using wavebands in the 500–570 nm region. Similarly, indices targeting anthocyanin pigments employ red/green spectral ratios [75], which may suggest that the selected wavebands across the green–red region are important not only as individual features but also in combination with others. These indices and the wavebands they incorporate capture meaningful physiological and biochemical variations, making them excellent discriminatory variables for crop and cultivar discrimination [4,69,71].

The short-wave infrared (SWIR) region contributed 18 wavebands selected across the 1820–1976 nm wavelength range. While Hennessy et al. [69] reported that SWIR bands (1800–2500 nm) showed the lowest average selection rates across crop mapping studies, previous research has highlighted their usefulness for cultivar discrimination [4]. In this study, the selected SWIR bands were concentrated on the edges of the strong absorption feature identified around 1825–1925 nm. The shoulders of these absorption features often carry useful information for crop analysis related to water content, leaf internal structure, and biochemical composition [76,77]. Notably, the 1927 and 1931 nm bands, previously identified by Mirzaei et al. [4] as important for grapevine cultivar discrimination, were also highlighted in the current analysis.

The results of the waveband aggregation are recorded in Table 2. The hierarchical clustering of the Pearson correlation matrix identified nine distinct clusters of spectrally collinear wavebands. These clusters formed the basis for our waveband aggregation, with wavebands in the same cluster binned based on their adjacency. The aggregation process generated ten broadband spectral features. The newly created spectral features represented a biologically grounded reduction in spectral complexity. For instance, the aggregation procedure identified four distinct bins predominantly within the red-edge portion of the EM spectrum (i.e., 670–690 nm, 703–716 nm, 717–730 nm, and 731–742 nm). The red-edge is well established as an important discriminator in crop mapping studies due to its sensitivity to chlorophyll concentration, canopy structure, and plant physiological status [4,69,70,71]. The emergence of multiple red-edge bins in the present study reinforces the spectral region’s importance in discriminating spectrally similar cultivars. Additionally, the results of the waveband aggregation suggest that the red-edge does not contribute to cultivar discrimination as a single broad region, but rather through several sub-intervals that carry complementary information. Importantly, the proposed methodology was able to detect subtle, complementary, and non-redundant spectral features within the red-edge, highlighting its strength in capturing fine-scale differences relevant for cultivar discrimination.

As stated earlier, wavebands on the edges of water absorption features often provide important spectral information [76,77], which could explain the aggregated wavebands of 1820–1824 nm, 1926–1927 nm, and 1931–1932 nm, respectively. These bins contained relatively few narrow wavebands, reflecting that adjacent SWIR bands on the shoulder of absorption features are less collinear and, thus, exhibit higher spectral variability and provide more informative spectral information. In contrast, the aggregated waveband of 1962–1976 nm was constructed from 15 narrow wavebands, reflecting the broader, less spectrally variable nature of the SWIR plateau (typically spanning 1950–2500 nm).

Two green broadband spectral features were also established on either side of the ‘green hump’ (550 nm). The wavebands of 508–511 nm have been reported to be particularly sensitive to carotenoid content [78], while the 550–566 nm wavebands are highly influenced by chlorophyll absorption [69,71]. Together, the 508–511 nm and 550–566 nm aggregated wavebands can effectively capture the ratio between carotenoid and chlorophyll leaf content, which is highly relevant for cultivar discrimination [4,69,71].

3.2. Assessment of Model Performance

Table 3 summarises the comparative performance of the subsets relative to the full dataset, evaluated using five classification algorithms and four accuracy metrics. When models were built on the full dataset (p = 1874), all classifiers excluding oRF achieved perfect performance (i.e., =1) across the F1, BACC, MCC and AUC metrics on the training data. Evaluating classifier performance on the test data revealed evidence of overfitting. The test accuracies for SVM (F1: 0.93; BACC: 0.98; MCC: 0.80; AUC: 0.90) and Multinom (F1: 0.93; BACC: 0.91; MCC: 0.71; AUC: 0.89) indicate that the two models maintained strong generalisation, while all other models showed a greater decline in predictive accuracy. The 1D CNN and MLP achieved high AUC values (0.99 and 0.92, respectively) on the test data, indicating strong class separability. However, their corresponding F1 and MCC values were substantially lower (CNN: F1 = 0.85, MCC = 0.83; MLP: F1 = 0.72, MCC = 0.67). This discrepancy highlights a divergence between their ability to separate spectral samples (measured by AUC) and their ability to assign correct class labels (measured by F1). Such differences can arise from class imbalance effects, where minority classes contribute less to overall performance, as well as from suboptimal decision thresholds, resulting in misclassifications. Overall, these results suggest that while the full spectrum contains sufficient discriminatory information, its high dimensionality introduces redundancy and noise that may limit robust generalisation.

The analysis of the two subsets indicates that reducing dimensionality leads to improved model performance when looking at median accuracy across the four metrics (Figure 3). For the initial PLS–ensemble selected subset, classifier performance on the test dataset improved, with SVM achieving top performance (Table 3) with near-perfect discrimination (F1: 1.0; BACC: 0.99; AUC: 0.96). Additionally, the 100-waveband subset improved model stability and generalisation, as evidenced by the reduced discrepancy between training and test performances (Table 3). These results indicate that the PLS–ensemble method successfully identified the most informative spectral regions for cultivar discrimination.

The aggregation of the PLS–ensemble subset further led to increased classifier performance (Table 3). Aggregating or binning narrow spectral features has been reported to improve the signal-to-noise ratio (SNR) by reducing noise and improving signal representation [79], which could partly explain the observed enhancement in classifier performance. Compared with the full dataset, the aggregated wavebands increased the performance of the oRF, Multinom, SVM, and MLP models between 4.0 and 25.0%, with an average increase of 10.0% observed across the four performance metrics. However, it should be noted that the subsets did not improve the median performance of the 1D CNN. For the 1D CNN, the PLS–ensemble subset reduced median performance by approximately 5.7%, while the aggregated wavebands decreased it by only 1.4%, producing comparable accuracies using just 0.5% of the original input variables. The reduced CNN performance should be interpreted in the context of data volume and architectural suitability: the ResNet-34 architecture employed is highly parameterised relative to the available training data, which limits the effectiveness of regularisation and increases the risk of overfitting. As such, CNN performance in this study is confounded by data scarcity and model mismatch and should not be taken as indicative of the general effectiveness of deep learning for hyperspectral cultivar discrimination. Similar observations have been reported in previous studies [80], indicating that deep architectures, such as ResNet-34—optimised for extracting complex patterns from high-dimensional data—become less optimal when the feature space is drastically reduced.

Notwithstanding this reduction in CNN performance, the inclusion of the 1D CNN experiments remains informative in the context of this study. The application of the proposed FS and waveband aggregation framework yielded more stable and consistent performance relative to the full spectral input. This supports the interpretation that the identified spectral subsets capture transferable and discriminative information across modelling paradigms, although CNN performance itself remains constrained by data volume and architectural suitability.

The best-performing model in this study was SVM with feature selection (F1: 1.00; BACC: 0.99; MCC: 0.82; AUC: 0.96). These results compare favourably with the previous literature on cultivar mapping, surpassing the models of Gutiérrez et al. [14], who reported a maximum F1 of 0.99, and Karakizi et al. [3], who achieved a highest accuracy of 0.850. The SVM model also produced accuracies consistent with those reported by López et al. [13] and Mirzaei et al. [4], both of whom observed values ranging from 0.98 to 1.00.

The impact of FS on inter-class cultivar separability is depicted in Figure 4. The full-spectrum dataset exhibits a narrow, leptokurtic distribution centred near 0.05 radians, implying high spectral similarity among cultivars. In contrast, the PLS–ensemble subset demonstrated a clear shift towards larger SAM distances, reflecting increased angular separation between cultivar spectra. Aggregating the PLS–ensemble subset further shifted the density curve to the right, confirming that the 10-waveband aggregated subset had the greatest pairwise SAM distances and the most improved inter-class separability.

3.3. Limitations and Future Work

While this study aimed to identify an optimal spectral subset for discriminating grapevine cultivars that are robust across classifiers, several methodological trade-offs between interpretability, complexity, and robustness should be acknowledged. Firstly, the ensemble FS approach was solely based on PLS. Although PLS is commonly used for FS of spectrometry data and ensures interpretability of the FS process, its linear nature may produce suboptimal subsets for algorithms—such as Random Forests or CNNs—that model non-linear relationships. However, the current study did demonstrate that the PLS–ensemble subset could enhance the performance of non-linear models such as oRF and MLP. Secondly, although employing 18 individual FS methods may have led to increased methodological complexity, this methodology incorporated robustness and consistency in waveband selection. However, similar results could also have been obtained using different or fewer FS methods. It should be noted that the dataset used in this study was collected from a single vineyard site, during a single phenological stage, in a single year, and included a total of 189 samples. Consequently, results and conclusions should be interpreted within this limited context. While the proposed PLS–ensemble and aggregation framework demonstrated consistent and improved performance across multiple classifiers on this dataset, further validation with larger datasets spanning multiple sites, years, and phenological stages is required to robustly assess the generalisability of the selected wavebands and to support broader recommendations for sensor design. Nevertheless, this study confirmed the effectiveness of the proposed ensemble FS approach. Lastly, decisions regarding thresholds and weighting in the Borda–SRD aggregation, while data-driven, represent parameters that guide feature selection and could influence subset composition. These parameters, therefore, require optimisation specific to each dataset. Overall, this work provides a baseline framework for robust waveband selection across classifiers.

4. Conclusions

The pursuit of optimal waveband selection for agricultural mapping has long been a topic of note. While many methods have been proposed, their selected subsets are often classifier-specific and lack transferability. The current study offers methodological novelty by presenting a PLS–ensemble and waveband aggregation framework aimed at addressing this shortcoming within the context of grapevine cultivar discrimination. The PLS–ensemble subset (p = 100) represents a balance between retaining discriminative accuracy and reducing classifier complexity. The ensemble FS approach highlighted key spectral regions (e.g., red-edge, yellow–green, and SWIR), improved model performance, and reduced dimensionality by approximately 95%. The subsequent aggregation of the 100 selected wavebands into 10 broadband spectral features demonstrated that the proposed methodology could achieve extreme dimensionality reduction (of 99.5%) without substantial loss in model performance. Consequently, this research establishes an FS methodology that enables a biologically grounded simplification of spectral data that is also optimal across different classifiers. This study’s findings demonstrate that cultivar-specific spectral signals are highly concentrated within a limited number of wavelengths. Moreover, the consistent superiority of SVM across all datasets suggests it is the most robust classifier for canopy-level cultivar discrimination. This research supports the development of more practical, broadband multispectral sensors that target only the most informative spectral regions, which could lower operational costs and facilitate real-time cultivar mapping. Future research should explore the following:

Integrating non-linear feature selection methods;
Validating spectral subsets across temporal and spatial domains;
Testing streamlined ensembles under tuned classifier settings.

Such extensions would strengthen the utility of spectral subset optimisation for viticulture and related agricultural applications.

Author Contributions

Conceptualisation, K.L., A.S. and Z.M.; methodology, K.L.; formal analysis, K.L.; investigation, K.L.; data curation, K.L.; writing—original draft preparation, K.L.; writing—review and editing, A.S. and Z.M.; visualisation, K.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Research Foundation (NRF) of South Africa, SCS space, and Raisins SA, and the APC was funded by the NRF under grant number 138229.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tufail, R.; Tassinari, P.; Torreggiani, D. Assessing feature extraction, selection, and classification combinations for crop mapping using Sentinel-2 time series: A case study in northern Italy. Remote Sens. Appl. Soc. Environ. 2025, 38, 101525. [Google Scholar] [CrossRef]
Nabil, M.; Farg, E.; Afify, N.M.; Arafat, S.M. Optimizing crop monitoring: Mapping cultivation stages and types with sentinel-1/2 and random forest algorithm. Int. J. Remote Sens. 2025, 46, 273–299. [Google Scholar] [CrossRef]
Karakizi, C.; Oikonomou, M.; Karantzalos, K. Vineyard Detection and Vine Variety Discrimination from Very High Resolution Satellite Data. Remote Sens. 2016, 8, 235. [Google Scholar] [CrossRef]
Mirzaei, M.; Marofi, S.; Abbasi, M.; Solgi, E.; Karimi, R.; Verrelst, J. Scenario-based discrimination of common grapevine varieties using in-field hyperspectral data in the western of Iran. Int. J. Appl. Earth Obs. Geoinf. 2019, 80, 26–37. [Google Scholar] [CrossRef]
Carneiro, G.A.; Cunha, A.; Aubry, T.J.; Sousa, J. Advancing Grapevine Variety Identification: A Systematic Review of Deep Learning and Machine Learning Approaches. AgriEngineering 2024, 6, 4851–4888. [Google Scholar] [CrossRef]
Bramley, R.G.V.; Ouzman, J.; Sturman, A.P.; Grealish, G.J.; Ratcliff, C.E.M.; Trought, M.C.T. Underpinning Terroir with Data: Integrating Vineyard Performance Metrics with Soil and Climate Data to Better Understand Within-Region Variation in Marlborough, New Zealand. Aust. J. Grape Wine Res. 2023, 2023, 8811402. [Google Scholar] [CrossRef]
Ferro, M.V.; Catania, P. Technologies and Innovative Methods for Precision Viticulture: A Comprehensive Review. Horticulturae 2023, 9, 399. [Google Scholar] [CrossRef]
Li, W.; Feng, F.; Li, H.; Du, Q. Discriminant Analysis-Based Dimension Reduction for Hyperspectral Image Classification: A Survey of the Most Recent Advances and an Experimental Comparison of Different Techniques. IEEE Geosci. Remote Sens. Mag. 2018, 6, 15–34. [Google Scholar] [CrossRef]
Canero, F.M.; Rodriguez-Galiano, V.; Aragones, D. Machine Learning and Feature Selection for soil spectroscopy. An evaluation of Random Forest wrappers to predict soil organic matter, clay, and carbonates. Heliyon 2024, 10, e30228. [Google Scholar] [CrossRef]
Hughes, G. On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 1968, 14, 55–63. [Google Scholar] [CrossRef]
Raja, S.P.; Sawicka, B.; Stamenkovic, Z.; Mariammal, G. Crop Prediction Based on Characteristics of the Agricultural Environment Using Various Feature Selection Techniques and Classifiers. IEEE Access 2022, 10, 23625–23641. [Google Scholar] [CrossRef]
Imran, H.A.; Zeggada, A.; Ianniello, I.; Melgani, F.; Polverari, A.; Baroni, A.; Danzi, D.; Goller, R. Low-cost handheld spectrometry for detecting Flavescence dorée in vineyards. Appl. Sci. 2023, 13, 2388. [Google Scholar] [CrossRef]
López, A.; Ogayar, C.J.; Feito, F.R.; Sousa, J.J. Classification of Grapevine Varieties Using UAV Hyperspectral Imaging. Remote Sens. 2024, 16, 2103. [Google Scholar] [CrossRef]
Gutiérrez, S.; Fernández-Novales, J.; Diago, M.P.; Tardaguila, J. On-the-go hyperspectral imaging under field conditions and machine learning for the classification of grapevine varieties. Front. Plant Sci. 2018, 9, 1102. [Google Scholar] [CrossRef]
Pôças, I.; Tosin, R.; Gonçalves, I.; Cunha, M. Toward a generalized predictive model of grapevine water status in Douro region from hyperspectral data. Agric. For. Meteorol. 2020, 280, 107793. [Google Scholar] [CrossRef]
Rodriguez-Galiano, V.F.; Luque-Espinar, J.A.; Chica-Olmo, M.; Mendes, M.P. Feature selection approaches for predictive modelling of groundwater nitrate pollution: An evaluation of filters, embedded and wrapper methods. Sci. Total Environ. 2018, 624, 661–672. [Google Scholar] [CrossRef]
Loggenberg, K.; Poona, N. A feature selection approach for terrestrial hyperspectral image analysis. S. Afr. J. Geomat. 2020, 9, 302–320. [Google Scholar] [CrossRef]
Santos-Rufo, A.; Mesas-Carrascosa, F.-J.; García-Ferrer, A.; Meroño-Larriva, J.E. Wavelength Selection Method Based on Partial Least Square from Hyperspectral Unmanned Aerial Vehicle Orthomosaic of Irrigated Olive Orchards. Remote Sens. 2020, 12, 3426. [Google Scholar] [CrossRef]
He, S.; Peng, P.; Chen, Y.; Wang, X. Multi-Crop Classification Using Feature Selection-Coupled Machine Learning Classifiers Based on Spectral, Textural and Environmental Features. Remote Sens. 2022, 14, 3153. [Google Scholar] [CrossRef]
Zhang, X.; Xue, J.; Chen, S.; Wang, N.; Xie, T.; Xiao, Y.; Chen, X.; Shi, Z.; Huang, Y.; Zhuo, Z. Fine Resolution Mapping of Soil Organic Carbon in Croplands with Feature Selection and Machine Learning in Northeast Plain China. Remote Sens. 2023, 15, 5033. [Google Scholar] [CrossRef]
Swe, K.N.; Takai, S.; Noguchi, N. Novel approaches for a brix prediction model in Rondo wine grapes using a hyperspectral Camera: Comparison between destructive and Non-destructive sensing methods. Comput. Electron. Agric. 2023, 211, 108037. [Google Scholar] [CrossRef]
Rapaport, T.; Hochberg, U.; Shoshany, M.; Karnieli, A.; Rachmilevitch, S. Combining leaf physiology, hyperspectral imaging and partial least squares-regression (PLS-R) for grapevine water status assessment. ISPRS J. Photogramm. Remote Sens. 2015, 109, 88–97. [Google Scholar] [CrossRef]
Fu, X.; Zhou, W.; Zhou, X.; Hu, Y. Crop Mapping and Spatio–Temporal Analysis in Valley Areas Using Object-Oriented Machine Learning Methods Combined with Feature Optimization. Agronomy 2023, 13, 2467. [Google Scholar] [CrossRef]
Chancia, R.; Bates, T.; Vanden Heuvel, J.; van Aardt, J. Assessing grapevine nutrient status from unmanned aerial system (UAS) hyperspectral imagery. Remote Sens. 2021, 13, 4489. [Google Scholar] [CrossRef]
Gao, H.; Xu, L.; Li, C.; Shi, A.; Huang, F.; Ma, Z. A New Feature Selection Method for Hyperspectral Image Classification Based on Simulated Annealing Genetic Algorithm and Choquet Fuzzy Integral. Math. Probl. Eng. 2013, 2013, 537268. [Google Scholar] [CrossRef]
Shastry, K.; Sanjay, H. A modified genetic algorithm and weighted principal component analysis based feature selection and extraction strategy in agriculture. Knowl.-Based Syst. 2021, 232, 107460. [Google Scholar] [CrossRef]
Sawant, S.S.; Manoharan, P.; Loganathan, A. Band selection strategies for hyperspectral image classification based on machine learning and artificial intelligent techniques –Survey. Arab. J. Geosci. 2021, 14, 646. [Google Scholar] [CrossRef]
Pero, C.; Bakshi, S.; Nappi, M.; Tortora, G. IoT-Driven Machine Learning for Precision Viticulture Optimization. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 2437–2447. [Google Scholar] [CrossRef]
Loggenberg, K.; Strever, A.; Münch, Z. Scoping the Field: Recent Advances in Optical Remote Sensing for Precision Viticulture. ISPRS Int. J. Geo-Inf. 2024, 13, 385. [Google Scholar] [CrossRef]
Drotár, P.; Gazda, M.; Vokorokos, L. Ensemble feature selection using election methods and ranker clustering. Inf. Sci. 2019, 480, 365–380. [Google Scholar] [CrossRef]
L’Heureux, A.; Grolinger, K.; Elyamany, H.F.; Capretz, M.A.M. Machine Learning with Big Data: Challenges and Approaches. IEEE Access 2017, 5, 7776–7797. [Google Scholar] [CrossRef]
Gabrielli, M.; Ounaissi, D.; Lançon-Verdier, V.; Julien, S.; Le Meurlay, D.; Maury, C. Hyperspectral imaging to assess wine grape quality. JSFA Rep. 2023, 3, 452–462. [Google Scholar] [CrossRef]
Diago, M.P.; Fernandes, A.M.; Millan, B.; Tardaguila, J.; Melo-Pinto, P. Identification of grapevine varieties using leaf spectroscopy and partial least squares. Comput. Electron. Agric. 2013, 99, 7–13. [Google Scholar] [CrossRef]
Rafique, R.; Ahmad, T.; Ahmed, M.; Azam Khan, M. Exploring key physiological attributes of grapevine cultivars under the influence of seasonal environmental variability. OENO One 2023, 57, 381–397. [Google Scholar] [CrossRef]
Borgogno-Mondino, E.; De Palma, L.; Novello, V. Investigating Sentinel 2 Multispectral Imagery Efficiency in Describing Spectral Response of Vineyards Covered with Plastic Sheets. Agronomy 2020, 10, 1909. [Google Scholar] [CrossRef]
Carey, V.A.; Saayman, D.; Archer, E.; Barbeau, G.; Wallace, M. Viticultural terroirs in Stellenbosch, South Africa. I. The identification of natural terroir units. OENO One 2008, 42, 169–183. [Google Scholar] [CrossRef]
Council for Scientific and Industrial Research (CSIR). Cape Winelands District Municipality Climate Change Adaptation Plan: Draft 1; CSIR GreenBook: Pretoria, South Africa, 2023; pp. 11–12. [Google Scholar]
Lin, W.; Hang, H.; Zhuang, Y.; Zhang, S. Variable selection in partial least squares with the weighted variable contribution to the first singular value of the covariance matrix. Chemom. Intell. Lab. Syst. 2018, 183, 113–121. [Google Scholar] [CrossRef]
Wang, L.-L.; Lin, Y.-W.; Wang, X.-F.; Xiao, N.; Xu, Y.-D.; Li, H.-D.; Xu, Q.-S. A selective review and comparison for interval variable selection in spectroscopic modeling. Chemom. Intell. Lab. Syst. 2018, 172, 229–240. [Google Scholar] [CrossRef]
Sinha, R.; Khot, L.R.; Rathnayake, A.P.; Gao, Z.; Naidu, R.A. Visible-near infrared spectroradiometry-based detection of grapevine leafroll-associated virus 3 in a red-fruited wine grape cultivar. Comput. Electron. Agric. 2019, 162, 165–173. [Google Scholar] [CrossRef]
Wold, S.; Martens, H.; Wold, H. The multivariate calibration problem in chemistry solved by the PLS method. In Matrix Pencils: Proceedings of a Conference Held at Pite Havsbad, Sweden, 22–24 March 1982; Springer: Berlin/Heidelberg, Germany, 1983; pp. 286–293. [Google Scholar]
Martens, H. Multivariate Calibration. Doctoral Thesis, Technical University of Norway, Trondheim, Norway, 1985. [Google Scholar]
Helland, I.S. On the structure of partial least squares regression. Commun. Stat.-Simul. Comput. 1988, 17, 581–607. [Google Scholar] [CrossRef]
Abrantes, G.; Almeida, V.; Maia, A.J.; Nascimento, R.; Nascimento, C.; Silva, Y.; Silva, Y.; Veras, G. Comparison between Variable-Selection Algorithms in PLS Regression with Near-Infrared Spectroscopy to Predict Selected Metals in Soil. Molecules 2023, 28, 6959. [Google Scholar] [CrossRef]
Mehmood, T.; Liland, K.H.; Snipen, L.; Sæbø, S. A review of variable selection methods in Partial Least Squares Regression. Chemom. Intell. Lab. Syst. 2012, 118, 62–69. [Google Scholar] [CrossRef]
Wold, S.; Sjöström, M.; Eriksson, L. PLS-regression: A basic tool of chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109–130. [Google Scholar] [CrossRef]
Mehmood, T.; Sæbø, S.; Liland, K.H. Comparison of variable selection methods in partial least squares regression. J. Chemom. 2020, 34, e3226. [Google Scholar] [CrossRef]
Andersen, C.M.; Bro, R. Variable selection in regression—A tutorial. J. Chemom. 2010, 24, 728–737. [Google Scholar] [CrossRef]
Kvalheim, O.M. Variable importance: Comparison of selectivity ratio and significance multivariate correlation for interpretation of latent-variable regression models. J. Chemom. 2020, 34, e3211. [Google Scholar] [CrossRef]
Yang, W.; Xiong, Y.; Wang, H.; Wu, T.; Du, Y. Interval interaction moving window partial least squares for wavelength interval selection in near infrared spectroscopy. Chemom. Intell. Lab. Syst. 2023, 241, 104976. [Google Scholar] [CrossRef]
Nørgaard, L.; Saudland, A.; Wagner, J.; Nielsen, J.P.; Munck, L.; Engelsen, S.B. Interval Partial Least-Squares Regression (iPLS): A Comparative Chemometric Study with an Example from Near-Infrared Spectroscopy. Appl. Spectrosc. 2000, 54, 413–419. [Google Scholar] [CrossRef]
Kucheryavskiy, S. mdatools—R package for chemometrics. Chemom. Intell. Lab. Syst. 2020, 198, 103937. [Google Scholar] [CrossRef]
Mishra, S.; Mishra, D.; Mallick, P.K.; Santra, G.H.; Kumar, S. A Novel Borda Count based Feature Ranking and Feature Fusion Strategy to Attain Effective Climatic Features for Rice Yield Prediction. Informatica 2021, 45. [Google Scholar] [CrossRef]
Miri, M.; Dowlatshahi, M.B.; Hashemi, A. Evaluation multi label feature selection for text classification using weighted borda count approach. In 2022 9th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS); IEEE: New York, NY, USA, 2022. [Google Scholar]
Héberger, K. Sum of ranking differences compares methods or models fairly. TrAC Trends Anal. Chem. 2010, 29, 101–109. [Google Scholar] [CrossRef]
Héberger, K.; Kollár-Hunek, K. Sum of ranking differences for method discrimination and its validation: Comparison of ranks with random numbers. J. Chemom. 2011, 25, 151–158. [Google Scholar] [CrossRef]
Varoquaux, G.; Colliot, O. Evaluating Machine Learning Models and Their Diagnostic Value. In Machine Learning for Brain Disorders; Humana: New York, NY, USA, 2023; pp. 601–630. [Google Scholar]
Opitz, J. A Closer Look at Classification Evaluation Metrics and a Critical Reflection of Common Evaluation Practice. Trans. Assoc. Comput. Linguist. 2024, 12, 820–836. [Google Scholar] [CrossRef]
Menze, B.H.; Kelm, B.M.; Splitthoff, D.N.; Koethe, U.; Hamprecht, F.A. On Oblique Random Forests. In Machine Learning and Knowledge Discovery in Databases; Springer: Berlin/Heidelberg, Germany, 2011; pp. 453–469. [Google Scholar]
Poona, N.; Van Niekerk, A.; Ismail, R. Investigating the Utility of Oblique Tree-Based Ensembles for the Classification of Hyperspectral Data. Sensors 2016, 16, 1918. [Google Scholar] [CrossRef] [PubMed]
Jaeger, B.C.; Welden, S.; Lenoir, K.; Pajewski, N.M. aorsf: An R package for supervised learning using the oblique random survival forest. J. Open Source Softw. 2022, 7, 4705. [Google Scholar] [CrossRef]
Kuhn, M. Building Predictive Models in R Using the caret Package. J. Stat. Softw. 2008, 28, 1–26. [Google Scholar] [CrossRef]
Kleinbaum, D.G.; Klein, M. Introduction to Logistic Regression. In Logistic Regression: A Self-Learning Text; Kleinbaum, D.G., Klein, M., Eds.; Springer: New York, NY, USA, 2010; pp. 1–39. [Google Scholar]
Boser, B.E.; Guyon, I.M.; Vapnik, V.N. A training algorithm for optimal margin classifiers. In COLT ’92: Proceedings of the Fifth Annual Workshop on Computational Learning Theory; Association for Computing Machinery: New York, NY, USA, 1992; pp. 144–152. [Google Scholar]
Meyer, D.; Dimitriadou, E.; Hornik, K.; Weingessel, A.; Leisch, F. e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071); TU Wien: Vienna, Austria, 2023. [Google Scholar]
Skidmore, A.; Turner, B.; Brinkhof, W.; Knowles, E. PERFORMANCE OF A NEURAL NETWORK: MAPPING FORESTS USING GIS AND REMOTELY SENSED DATA. Photogramm. Eng. Remote Sens. 1997, 63, 501–514. [Google Scholar]
Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. TensorFlow: A system for large-scale machine learning. In OSDI’16: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation; USENIX Association: Berkeley, CA, USA, 2016; pp. 265–283. [Google Scholar]
Cacciari, I.; Ranfagni, A. Hands-On Fundamentals of 1D Convolutional Neural Networks—A Tutorial for Beginner Users. Appl. Sci. 2024, 14, 8500. [Google Scholar] [CrossRef]
Hennessy, A.; Clarke, K.; Lewis, M. Hyperspectral Classification of Plants: A Review of Waveband Selection Generalisability. Remote Sens. 2020, 12, 113. [Google Scholar] [CrossRef]
Pôças, I.; Rodrigues, A.; Gonçalves, S.; Costa, P.; Gonçalves, I.; Pereira, L.; Cunha, M. Predicting grapevine water status based on hyperspectral reflectance vegetation indices. Remote Sens. 2015, 7, 16460–16479. [Google Scholar] [CrossRef]
Sims, D.A.; Gamon, J.A. Relationships between leaf pigment content and spectral reflectance across a wide range of species, leaf structures and developmental stages. Remote Sens. Environ. 2002, 81, 337–354. [Google Scholar] [CrossRef]
Imran, H.A.; Gianelle, D.; Rocchini, D.; Dalponte, M.; Martín, M.P.; Sakowska, K.; Wohlfahrt, G.; Vescovo, L. VIS-NIR, Red-Edge and NIR-Shoulder Based Normalized Vegetation Indices Response to Co-Varying Leaf and Canopy Structural Traits in Heterogeneous Grasslands. Remote Sens. 2020, 12, 2254. [Google Scholar] [CrossRef]
Li, C.; Czyż, E.A.; Halitschke, R.; Baldwin, I.T.; Schaepman, M.E.; Schuman, M.C. Evaluating potential of leaf reflectance spectra to monitor plant genetic variation. Plant Methods 2023, 19, 108. [Google Scholar] [CrossRef]
Khadka, K.; Burt, A.J.; Earl, H.J.; Raizada, M.N.; Navabi, A. Does Leaf Waxiness Confound the Use of NDVI in the Assessment of Chlorophyll When Evaluating Genetic Diversity Panels of Wheat? Agronomy 2021, 11, 486. [Google Scholar] [CrossRef]
Gitelson, A.A.; Merzlyak, M.N.; Chivkunova, O.B. Optical Properties and Nondestructive Estimation of Anthocyanin Content in Plant Leaves. Photochem. Photobiol. 2007, 74, 38–45. [Google Scholar] [CrossRef]
Yu, Z.; Zhang, X.; Liu, H.; Zhang, Z.; Meng, L.; Han, Y.; Lu, L. Improving SPAD spectral estimation accuracy of rice leaves by considering the effect of leaf water content. Crop Sci. 2022, 62, 2382–2395. [Google Scholar] [CrossRef]
Li, Y.; Yang, K.; Wu, B. Feature Selection and Spectral Indices for Identifying Maize Stress Types. Appl. Spectrosc. 2025, 79, 306–319. [Google Scholar] [CrossRef] [PubMed]
Gitelson, A.A.; Zur, Y.; Chivkunova, O.B.; Merzlyak, M.N. Assessing carotenoid content in plant leaves with reflectance spectroscopy. Photochem. Photobiol. 2002, 75, 272–281. [Google Scholar] [CrossRef] [PubMed]
Jernelv, I.L.; Hjelme, D.R.; Matsuura, Y.; Aksnes, A. Convolutional neural networks for classification and regression analysis of one-dimensional spectral data. arXiv 2020, arXiv:2005.07530. [Google Scholar] [CrossRef]
Rossberg, N.; Gautam, R.; Komolibus, K.; O’Sullivan, B.; Visentin, A. Explainable AI-Based Feature Selection Approaches for Raman Spectroscopy. Diagnostics 2025, 15, 2063. [Google Scholar] [CrossRef]

Figure 1. Schematic overview of the experimental workflow for cross-learner spectral subset optimisation.

Figure 2. Selection frequency of spectral wavebands across the 18 PLS-based feature selection methods. Highlighted regions in red indicate the region of the final 100-waveband subset selected by the PLS–ensemble. Gaps in the distribution correspond to atmospheric absorption regions that were excluded.

Figure 3. Radar plots showing the median test dataset performance of five classifiers trained on (a) the full dataset (p = 1874), (b) the PLS–ensemble subset (p = 100), and (c) the aggregated subset (p = 10). In each panel, the filled polygon represents the median performance of the classifiers for that dataset, scaled between 0 and 1. The coloured radial outlines are overlaid as a common reference for direct comparison of classifier performance across the three datasets: yellow = full, blue = PLS–ensemble, and red = aggregated.

Figure 4. Density probability distributions of pairwise Spectral Angle Mapper (SAM) distances for the full dataset (p = 1874), the PLS–ensemble-selected subset (p = 100), and the aggregated waveband subset (p = 10). Pairwise SAM distances were computed between mean cultivar spectra. The curve height represents the relative density of pairwise distances, while rightward shifts in the distributions indicate increased inter-class spectral separability.

Table 1. PLS–ensemble selected spectral wavebands across the electromagnetic (EM) regions used for cultivar discrimination. Wavebands are grouped by EM region.

EM Region	Selected Wavebands (nm)	Number of Wavebands
Red visible to near-infrared (NIR)	670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742	61
Green to yellow–green visible	508, 509, 510, 511, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566	21
Short-wave infrared (SWIR)	1820, 1821, 1822, 1823, 1824, 1926, 1927, 1931, 1932, 1962, 1963, 1969, 1970, 1971, 1972, 1973, 1975, 1976	18

Table 2. Aggregated spectral wavebands donated by their collinear cluster (C) and adjacency (A) labels. Each bin is accompanied by its corresponding waveband range and the individual wavelengths contained within the bin.

Cluster	Adjacency	Bin Label	EM Region	Waveband Range (nm)	Individual Wavebands (nm)	Number of Wavebands
C1	A4	C1_A4	Red-edge	717–730	717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730	14
C2	A4	C2_A4	Red-edge	731–742	731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742	12
C3	A5	C3_A5	SWIR	1820–1824	1820, 1821, 1822, 1823, 1824	5
C4	A2	C4_A2	Green	550–566	550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566	17
C4	A4	C4_A4	Red-edge	703–716	703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716	14
C5	A3	C5_A3	Red-edge	670–690	670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690	21
C6	A6	C6_A6	SWIR	1926–1927	1926, 1927	2
C7	A6	C7_A6	SWIR	1931–1932	1931, 1932	2
C8	A7	C8_A7	SWIR	1962–1976	1962, 1963, 1969, 1970, 1971, 1972, 1973, 1975, 1976	9
C9	A1	C9_A1	Green	508–511	508, 509, 510, 511	4

Table 3. Classification performance of the five models trained and tested on three input datasets: the full dataset (p = 1874), the PLS–ensemble selected subset (p = 100), and the aggregated waveband subset (p = 10). Performance was evaluated using F1, BACC, MCC, and AUC, with the top-performing model for each metric and dataset highlighted in bold.

Source	Metrics	Feature Set	oRF	Multinom	SVM	MLP	CNN
Train	F1	p = 1874	0.86	1	1	1	1
		p = 100	0.95	1	1	1	1
		p = 10	0.93	1	1	1	1
	BACC	p = 1874	0.85	1	1	1	1
		p = 100	0.91	0.99	0.99	1	1
		p = 10	0.9	0.99	0.98	1	1
	MCC	p = 1874	0.48	1	1	1	1
		p = 100	0.5	0.94	0.85	1	1
		p = 10	0.58	0.84	0.79	1	1
	AUC	p = 1874	0.71	1	1	1	1
		p = 100	0.77	0.97	0.91	1	1
		p = 10	0.79	0.93	0.9	1	1
Test	F1	p = 1874	0.75	0.93	0.93	0.72	0.85
		p = 100	0.88	0.93	1	0.8	0.78
		p = 10	1	1	1	0.87	0.83
	BACC	p = 1874	0.79	0.91	0.98	0.85	0.92
		p = 100	0.86	0.91	0.99	0.89	0.88
		p = 10	0.97	0.98	0.98	0.93	0.91
	MCC	p = 1874	0.53	0.71	0.8	0.67	0.83
		p = 100	0.6	0.71	0.82	0.76	0.74
		p = 10	0.64	0.8	0.78	0.85	0.8
	AUC	p = 1874	0.74	0.89	0.9	0.92	0.99
		p = 100	0.87	0.9	0.96	0.97	0.95
		p = 10	0.9	0.93	0.95	0.98	0.96

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Loggenberg, K.; Strever, A.; Münch, Z. Cross-Learner Spectral Subset Optimisation: PLS–Ensemble Feature Selection with Weighted Borda Count for Grapevine Cultivar Discrimination. Geomatics 2026, 6, 12. https://doi.org/10.3390/geomatics6010012

AMA Style

Loggenberg K, Strever A, Münch Z. Cross-Learner Spectral Subset Optimisation: PLS–Ensemble Feature Selection with Weighted Borda Count for Grapevine Cultivar Discrimination. Geomatics. 2026; 6(1):12. https://doi.org/10.3390/geomatics6010012

Chicago/Turabian Style

Loggenberg, Kyle, Albert Strever, and Zahn Münch. 2026. "Cross-Learner Spectral Subset Optimisation: PLS–Ensemble Feature Selection with Weighted Borda Count for Grapevine Cultivar Discrimination" Geomatics 6, no. 1: 12. https://doi.org/10.3390/geomatics6010012

APA Style

Loggenberg, K., Strever, A., & Münch, Z. (2026). Cross-Learner Spectral Subset Optimisation: PLS–Ensemble Feature Selection with Weighted Borda Count for Grapevine Cultivar Discrimination. Geomatics, 6(1), 12. https://doi.org/10.3390/geomatics6010012

Article Menu

Cross-Learner Spectral Subset Optimisation: PLS–Ensemble Feature Selection with Weighted Borda Count for Grapevine Cultivar Discrimination

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Design

2.2. Construction of PLS–Ensemble

2.2.1. Filter-Based Feature Selection

2.2.2. Wrapper-Based Feature Selection

2.2.3. Multicriteria Evaluation (Hybrid Filter Approach)

2.2.4. Combined Approaches

2.2.5. Final PLS–Ensemble

2.3. Aggregation of Spectral Wavebands

2.4. Assessment of Waveband Subset

2.4.1. Oblique Random Forest (oRF)

2.4.2. Multinomial Logistic Regression (Multinom)

2.4.3. Support Vector Machine (SVM)

2.4.4. Multi-Layer Perceptron (MLP)

2.4.5. One-Dimensional Convolutional Neural Network (1D CNN)

3. Results and Discussion

3.1. Assessment of the PLS–Ensemble and Aggregated Subsets

3.2. Assessment of Model Performance

3.3. Limitations and Future Work

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI