1. Introduction
Hyperspectral imaging (HSI) has emerged as a powerful remote sensing tool, enabling the fine-grained discrimination of materials, vegetation health assessment, and monitoring of environmental disturbances due to its hundreds of contiguous narrow spectral bands [
1,
2,
3]. The rich spectral information supports applications ranging from land cover classification [
4] and crop mapping [
5] to soil analysis [
6] and quality inspection in agriculture [
7]. However, the high dimensionality of HSI data introduces significant challenges, including the “curse of dimensionality”, spectral redundancy, computational complexity, and reduced model generalization in machine learning pipelines [
8,
9].
Band selection is a critical preprocessing step in hyperspectral image (HSI) analysis, aimed at reducing spectral redundancy while preserving the most informative and discriminative spectral bands for downstream tasks such as classification, detection, and regression [
2,
9]. Hyperspectral dimensionality reduction is typically achieved through either feature extraction or band selection [
1,
2]. Feature extraction methods transform the original data into a new feature space by combining spectral information across all bands, which may improve compactness but often sacrifices the physical interpretability of the spectral signatures [
10,
11,
12,
13,
14]. In contrast, band selection directly identifies a subset of original spectral bands, thereby preserving the inherent spectral meaning and facilitating domain-specific interpretation [
1,
15].
A wide range of hyperspectral band-selection methods have been proposed and can be broadly categorized into six groups: ranking-based, searching-based, clustering-based, sparsity-based, embedding learning-based, and hybrid approaches [
1,
2,
3]. Ranking-based methods evaluate each band independently using criteria such as variance, entropy, mutual information, or correlation, offering high computational efficiency but often neglecting inter-band relationships [
1,
2,
3]. Searching-based methods aim to identify optimal band subsets by explicitly optimizing an objective function, typically achieving improved classification performance at the cost of higher computational complexity [
2,
16]. Clustering-based methods group spectrally similar bands and select representative bands from each cluster, effectively reducing redundancy while maintaining spectral diversity [
17,
18]. Recent advances in this category include fast neighborhood grouping strategies that exploit coarse-to-fine spectral partitioning to capture contextual information across broad spectral ranges [
18], and intergroup-difference-based ranking methods that explicitly account for redundancy across selected band groups rather than within individual clusters [
19]. Graph-based methods extend these ideas by learning structural relationships among spectral bands through graph matrices, enabling joint exploitation of spatial and spectral information that single-domain methods overlook [
20]. These structural and grouping perspectives are particularly relevant to the present study: the edge-preserving operators in SSEP are motivated by the same principle of retaining discriminative high-frequency spatial structure that graph-based methods capture spectrally, and the redundancy penalty in SRPA directly mirrors the inter group difference criterion of [
19]. More recently, embedding learning-based methods have integrated band selection into model training, enabling task-driven selection through deep learning, attention mechanisms, and graph-based models [
21,
22,
23]. Reinforcement learning-based approaches further extend this paradigm by formulating band selection as a sequential decision-making problem, allowing for adaptive and dynamic selection strategies [
24,
25,
26,
27].
From a theoretical perspective, the band selection problem can be formulated as selecting a subset of bands that maximizes task-specific performance while minimizing redundancy and computational cost [
14]. Importantly, prior studies have shown that redundant and irrelevant spectral bands can degrade model performance, introduce instability, and propagate noise, particularly under limited training samples, highlighting the necessity of effective band-selection strategies in hyperspectral analysis [
15,
28]. While these methods have been extensively studied in general remote sensing applications, their effectiveness in wildfire and prescribed fire environments remains less explored.
These challenges are particularly acute in wildfire science and prescribed fire management, where spectral responses are influenced by burn severity, char deposition, canopy structure, and early vegetation recovery [
29,
30]. Wildfires pose a growing threat to ecosystems, communities, and economies in the western United States, with Montana experiencing particularly pronounced impacts due to its vast forested landscapes, dry continental climate, and increasing human development in the wildland–urban interface. Over the past two decades, Montana has seen a dramatic rise in wildfire activity: average annual acres burned have exceeded 300,000 since 2000—more than ten times the pre-1950 average of less than 30,000 acres—driven by warmer temperatures, prolonged droughts, fuel accumulation from historical fire suppression, and climate change-induced aridity [
31,
32]. Recent seasons illustrate variability but underscore the trend: in 2024, over 352,000 acres burned, while 2025 saw a milder year with approximately 76,000 acres (one of the fourth-lowest totals in the last 15 years), attributed to favorable weather but still highlighting ongoing risks [
33,
34].
Prescribed fires—carefully planned, low-intensity burns—serve as a critical proactive tool to mitigate these threats by reducing hazardous fuel loads, restoring ecological processes, improving wildlife habitat, recycling nutrients, and promoting resilient forests. In Montana, where historical natural fire regimes burned tens to hundreds of thousands of acres annually before suppression, prescribed burning helps counteract fuel buildup and supports forest health. State and federal programs (e.g., DNRC, USFS Helena-Lewis and Clark National Forest) have expanded efforts, with recent accomplishments including tens of thousands of acres treated annually and proposals to scale to approximately 40,000 acres per year through 2045 on national forests alone [
35,
36,
37]. Despite these advances, challenges persist, including limited burn windows due to air quality regulations, smoke concerns, and the need for precise monitoring of fuel types and post-burn recovery.
Accurate ground fuel classification—distinguishing live vegetation, dead biomass, and species-specific conditions—is essential for planning effective prescribed burns, predicting fire behavior, and evaluating ecological outcomes. Hyperspectral remote sensing, with its fine spectral resolution, offers significant potential for this task, yet real-world applications in fire-affected Montana forests remain underexplored due to data scarcity and dimensionality issues [
38,
39,
40].
Despite advances in HSI band selection for specific tasks [
41,
42], few studies integrate multiple strategies (e.g., statistical, edge-aware, attention-guided, and reinforcement learning-based) in a unified pipeline, especially for prescribed fire applications. Prior work has explored attention and edge-aware methods for burned vegetation classification [
43] and self-supervised/DRL approaches for prescribed burn impact analysis [
44], highlighting the potential of learning-based techniques in fire-affected scenes. However, systematic comparisons across benchmarks and real-world Unmanned Aircraft System (UAS) datasets remain limited, particularly for ground fuel mapping in controlled burns.
This study addresses these gaps by presenting a comprehensive evaluation of four band-selection strategies, along with a clustering-based baseline method (K-Means Clustering-Based Band Selection: KMCBS), spanning both classical and modern approaches. Specifically, we consider Principal Component Analysis (PCA) as a variance-based baseline [
15], Spatial–Spectral Edge Preservation (SSEP) to incorporate spatial structural information [
21], Spectral-Redundancy Penalized Attention (SRPA) for attention-driven and redundancy-aware selection [
21,
45], and Deep Reinforcement Learning (DRL) to model band selection as a sequential decision-making process [
24,
25,
26,
27]. These methods represent complementary perspectives on hyperspectral band selection, enabling a systematic analysis of the trade-offs between information preservation, redundancy reduction, and computational efficiency. These band-selection strategies are evaluated in combination with classical classifiers, including Random Forest, Support Vector Machines, and K-Nearest Neighbors, as well as deep learning models such as 3D Convolutional Neural Networks, combined with classical (Random Forest, Support Vector Machines, and K-Nearest Neighbors) and deep (3D Convolutional Neural Networks) classifiers. We apply a modular, reproducible pipeline to benchmark datasets (Indian Pines, Pavia University, Salinas, Botswana, and Kennedy Space Center) and a novel VNIR hyperspectral dataset collected via UAS over prescribed burn sites at the Lubrecht Experimental Forest, Montana, USA. This dataset captures pre-burn conditions in a controlled thin-burn plot, providing a unique real-world testbed for understanding band selection behavior in fire-prone forest environments.
2. Materials and Methods
The goal of this study is to systematically evaluate a unified and reproducible hyperspectral image (HSI) classification pipeline across both benchmark airborne datasets and a complex UAS-based visible–near-infrared (VNIR) dataset captured over prescribed burn sites in the Lubrecht Experimental Forest, a large outdoor forest research laboratory located in the Blackfoot River drainage, about 30 miles northeast of Missoula, Montana. To accomplish this, we designed a modular workflow consisting of four tightly integrated stages illustrated in process diagram
Figure 1: (1) dataset preparation and exploratory data analysis (EDA), (2) noise detection and data cleaning, (3) band selection using classical and deep learning-based feature reduction techniques, and (4) classification using both traditional machine learning models and deep learning architectures. All components were implemented in a consistent, dataset-agnostic fashion so that the same pipeline could be applied to Indian Pines, Pavia University, Salinas, Botswana, KSC, and the Montana VNIR dataset without modification.
This section describes the materials, datasets, preprocessing steps, algorithms, model architectures, and evaluation design used in the study, presenting each part as a connected narrative that reflects the logical progression of the project.
2.1. Data
Hyperspectral imagery consists of hundreds of narrow and contiguous spectral bands, providing rich spectral information for material discrimination. However, this high spectral resolution also introduces significant redundancy and the well-known Hughes phenomenon (also known as the “curse of dimensionality”), motivating the need for dimensionality reduction and band selection prior to classification. Consequently, standardized preprocessing and band selection are widely adopted in hyperspectral image analysis pipelines [
1,
2,
3].
We evaluate a unified hyperspectral classification framework using a diverse collection of datasets that span controlled benchmark scenes and a complex real-world Unmanned Aircraft System (UAS)-based acquisition. Specifically, experiments were conducted on five widely used benchmark hyperspectral datasets—Indian Pines, Pavia University, Salinas Valley, Botswana, and Kennedy Space Center [
46]—as well as a custom visible–near-infrared (VNIR) hyperspectral dataset collected over prescribed burn sites in Montana, USA. The benchmark datasets represent a range of land cover types, spatial resolutions, sensors, and class distributions, and have been extensively used in prior hyperspectral classification and band selection studies, enabling meaningful comparison with the existing literature [
8,
9,
47].
A summary of the benchmark datasets, including the sensor type, spatial dimensions, number of spectral bands before and after sensor-documented noisy band removal, number of classes, and spatial resolution, is provided in
Table 1. The cleaned benchmark cubes served as inputs to the subsequent exploratory data analysis. Rather than treating each benchmark dataset independently, all were processed using the same unified and dataset-agnostic pipeline to ensure consistency across preprocessing, band selection, and classification stages. This design allows performance differences to be attributed to algorithmic choices rather than dataset-specific handling.
In contrast to the benchmark scenes, the Montana VNIR dataset represents a real-world post-fire forest environment acquired using a UAS platform, characterized by higher spectral noise, severe class imbalance, sparse ground truth, and complex canopy structure. Details of the acquisition parameters, preprocessing steps, and dataset-specific challenges for the Montana dataset are described separately in
Section 2.1, as they differ substantially from those of the benchmark datasets and require specialized treatment.
Montana
The hyperspectral images used in this study were acquired on 11 May 2024 over a controlled thin-burn plot (330 m × 300 m) in the Lubrecht Experimental Forest, approximately 42 km east of Missoula, Montana, USA. Data collection was conducted using a Headwall co-aligned visible–near infrared (VNIR) and short-wave infrared (SWIR) imaging system, integrated with a FreeFly ALTA X heavy-duty Unmanned Aircraft System (UAS). Prior to the flight, the imaging system was calibrated using a certified Spectralon white diffuse reflectance standard to determine the optimal exposure time, frame period, and recommended flight speed. Based on this information, a flight plan was designed to maintain a nearly constant altitude relative to the terrain, thereby eliminating the need for terrain correction in the resulting imagery. Magnetometer calibration was also performed to ensure the accurate detection of the Earth’s magnetic field for reliable navigation. The precise location of the imaging system was recorded using a Trimble SPS585 global navigation satellite system (GNSS) unit for Real-Time Kinematic (RTK) surveying. Before executing the flight plan, the Inertial Measurement Unit (IMU) was initialized to track the aircraft’s movement and orientation. The hyperspectral imaging system captured 543 spectral band images—273 in the VNIR range (400–1000 nm) and 270 in the SWIR range (900–2500 nm). The data files acquired were preprocessed using Spectral View, the proprietary software provided by the manufacturer of the hyperspectral imaging system. The raw data, initially recorded in digital numbers (DNs), was corrected for noise and dark current using imagery collected prior to data acquisition. This process converted the DN values into spectral radiance (mW/(cm2·sr·µm)). The radiance data was then transformed into spectral reflectance using a calibrated reference tarp placed within the imaging scene during processing. Finally, the processed images were exported in GeoTIFF format to enable further analysis using standard image processing software.
2.2. Exploratory Data Analysis
Exploratory data analysis (EDA) was conducted prior to model training to examine radiometric distributions, assess spectral redundancy, and identify noisy or unstable bands. EDA plays a critical role in ensuring data integrity and preventing the propagation of artifacts into downstream band selection and the classification model [
9,
38].
2.3. EDA for Benchmark Hyperspectral Datasets
The objective of the benchmark dataset EDA was to standardize all preprocessing steps across the five classical hyperspectral scenes—Indian Pines, Pavia University, Salinas, Botswana, and Kennedy Space Center (KSC)—so that band-selection algorithms could be evaluated under consistent noise conditions. These scenes differ substantially in terms of their spatial dimensions, radiometric behavior, and class distributions, making a unified and statistically grounded EDA pipeline necessary. The goal of this pipeline was to verify data integrity, characterize radiometric distributions, identify low-SNR or water-absorption bands, and construct noise-free normalized cubes to serve as inputs for SSEP, SRPA, and DRL-based band selection as well as for classical and deep classifiers.
Each dataset was loaded from its .mat hyperspectral file and associated ground-truth (gt) .mat label mask using dataset-specific variable keys. Upon loading, the benchmark hyperspectral cubes exhibited the following dimensions: Indian Pines (145 × 145 × 200), PaviaU (610 × 340 × 103), Salinas (512 × 217 × 204), Botswana (1476 × 256 × 145) and KSC (512 × 614 × 176). The basic integrity diagnostics included the computation of global intensity minima and maxima, band-wise means, and band-wise standard deviations. Botswana displayed reflectance values ranging from 0 to 45,106, with an average band-wise mean of 1749.62 and an average standard deviation of 420.48. Indian Pines ranged from 955 to 9604, with an average band-wise mean of 2652.39 and an average standard deviation of 336.36. KSC ranged from –27 to 1244, with an average band-wise mean of 93.42 and an average standard deviation of 72.36. PaviaU showed intensities between 0 and 8000, with an average band-wise mean of 1389.12 and an average standard deviation of 716.43. Salinas ranged from –11 to 9207, with a corresponding mean band-wise mean of 1196.40 and an average standard deviation of 389.91. These statistics confirmed proper radiometric scaling and the absence of corruption in the raw reflectance values.
Ground-truth label distributions were analyzed to verify class presence and quantify class imbalance. Botswana contained 15 classes with pixel counts ranging from 95 to 314, against 374,608 background pixels. Indian Pines exhibited 16 classes, including extremely sparse categories (e.g., class 1 with only 46 pixels) and dense categories such as class 11 with 2455 pixels. KSC contained 13 classes with class populations ranging from 105 to 927 and background comprising 309,157 pixels. PaviaU included nine semantic classes, with some highly populated categories (e.g., class 2 with 18,649 pixels) and background containing 164,624 pixels. Salinas exhibited 16 classes, with major categories (e.g., class 8 with 11,271 pixels) and small categories (e.g., class 13 with 916 pixels). These results confirm a substantial class imbalance across all benchmark datasets, which should be considered in downstream learning.
To characterize spectral noise, each hyperspectral cube was reshaped into a matrix of size
, where
, enabling band-wise analysis across all spatial pixels. For each spectral channel
b, we computed its minimum and maximum intensities
, mean
, standard deviation
, dynamic range
, and zero-fraction:
Bands were flagged as suspicious if they exhibited extremely low variance (i.e.,
lying in the lowest 5% percentile of all variances), unusually small dynamic range (i.e.,
less than
of the global range), or high zero-fraction (i.e.,
). Although none of the benchmark datasets contained high zero-fraction bands, each dataset presented a small set of low-variance bands that aligned with known water-absorption and sensor-instability regions. This EDA-based noisy band identification was performed in addition to the initial sensor-documented band exclusion summarized in
Table 1. The heuristic procedure identified the following candidate noisy bands: Botswana—[111, 138–144]; Indian Pines—[102–104, 143–145, 195, 197–199]; KSC—[0–8]; PaviaU—[0–5]; and Salinas—[106–109, 146–148, 198, 201–203]. These empirically identified indices match previously documented noisy band lists in the hyperspectral imaging literature.
Based on these technical diagnostics and literature support, we removed the identified noisy bands from each dataset, yielding cleaned spectral dimensionalities of 137 bands (Botswana), 190 bands (Indian Pines), 167 bands (KSC), 97 bands (PaviaU), and 193 bands (Salinas). The spatial dimensions were unchanged, ensuring that only the spectral axis was compressed. This removal eliminated known water-absorption wavelengths and low–SNR regions while preserving all meaningful spectral–spatial structure required for downstream tasks.
After noisy band removal, each dataset was normalized per spectral channel using min–max scaling:
where
prevents division by zero. This transformation places all spectral channels in the
range, eliminates scale disparities among bands, and stabilizes distance metrics and gradients for machine learning models. The cleaned and normalized hyperspectral cubes and associated summary statistics were retained for downstream analysis.
The results of this benchmark EDA demonstrate that the unified pipeline not only confirms the structural and radiometric consistency of all datasets but also quantitatively identifies and removes noisy spectral regions using both heuristic and literature-validated criteria. By producing noise-controlled hyperspectral cubes with well–characterized statistics, the benchmark EDA ensures that subsequent band-selection and classification experiments are conducted under standardized, reproducible, and scientifically rigorous conditions.
2.4. EDA for the Montana UAV Dataset
The Montana VNIR dataset consists of a drone-acquired hyperspectral mosaic with 273 spectral bands and spatial dimensions of 10,706 × 8360 pixels. Before applying any band-selection algorithms or training classification models, we conducted a comprehensive exploratory data analysis (EDA) to (i) construct an accurate pixel-wise ground truth (GT), (ii) assess class distribution and label sparsity, and (iii) evaluate spectral-band radiometric quality and class-wise separability.
2.4.1. Ground-Truth Construction and Alignment Verification
Field plot measurements, provided as longitude–latitude points with species labels, were converted into a pixel-wise raster mask using a custom reprojection and rasterization pipeline. The GPS coordinates (EPSG:4326) were transformed into the coordinate reference system of the VNIR cube using rasterio.warp.transform, followed by conversion to pixel indices via the GeoTIFF affine transform. Each species was assigned a unique integer class ID (with zero reserved for background), and points within the VNIR footprint were written into a 2D mask saved as montana_gt.npy.
To validate GT–image alignment, we generated a pseudo-RGB composite generated using three high-SNR VNIR bands (R = 150, G = 100, and B = 50) and overlaid with the GT to verify geometric alignment. This procedure revealed and resolved an initial misalignment between labels and imagery. After reconstructing the GT using accurate reprojection, the overlay confirmed correct spatial correspondence between labeled pixels and canopy/burned regions. As shown in
Figure 2, the corrected GT aligns well with canopy structures visible in the hyperspectral mosaic. This verification step is essential, as even small registration errors in UAS hyperspectral imagery can significantly degrade classification accuracy.
2.4.2. Class Distribution and Label Sparsity
The final GT map contains five forest species or condition classes: Dead Biomass, Dead Ponderosa Pine, Douglas Fir, Ponderosa Pine, and Western Larch. The labels are extremely sparse relative to the full image extent, and the class distribution is highly imbalanced. Specifically, we observe 287 pixels for Dead Biomass, 212 for Ponderosa Pine, 183 for Douglas Fir, 49 for Western Larch, and only 30 for Dead Ponderosa Pine. Such an imbalance necessitates the use of class-balanced sampling, focal-loss formulations, or class weighting during model training to avoid bias toward majority classes.
2.4.3. Band-Wise Radiometric Quality Analysis
Each of the 273 bands was evaluated using descriptive and noise-related statistics computed across the full spatial domain, including the mean, standard deviation, minimum, maximum, dynamic range, coefficient of variation (), zero-fraction (proportion of pixels equal to zero), and a simple SNR proxy ().
The zero-fraction curve in
Figure 3 reveals substantial sensor noise at the spectral extremes. Bands 1–40 show very high zero-fraction values (
–
), indicating weak short-wavelength sensor responsivity. A similar degradation is observed at the high-wavelength end (bands 240–273), where the detector suffers from end-of-range rolloff.
The standard deviation curve in
Figure 4 complements this observation. Bands 40–130 exhibit stable but moderate variance, whereas bands 130–240 show the strongest spatial variability (≈0.014–
), representing the most informative spectral region for vegetation and burn discrimination. Beyond band 240, variance drops sharply again due to sensor noise.
Based on these EDA diagnostics, bands 1–40 and 250–273 were removed prior to downstream modeling, leaving 211 high-quality bands for analysis.
2.4.4. Radiometric Normalization and Clean Cube Creation
For each retained band, we applied robust percentile clipping (1st–99th percentile) followed by min–max normalization to the range . This process reduces the influence of shadows and extreme outliers, while mitigating cross-track illumination variation. The resulting cleaned VNIR cube serves as the standardized input for subsequent band-selection and classification experiments.
2.4.5. Class-Wise Spectral Signature Analysis
Using the cleaned cube and GT mask, we computed the mean spectral signature for each class by averaging spectral reflectance values across all labeled pixels:
The resulting spectra showed that the three live conifer species (Douglas Fir, Ponderosa Pine, and Western Larch) share closely aligned spectral profiles, with only subtle differences. Dead biomass exhibits lower reflectance and altered mid-VNIR slopes, but separability is distributed across many bands. Unlike benchmark datasets (e.g., Indian Pines and Pavia), spectral contrast in this dataset is broad rather than concentrated, indicating that larger Top-K values (e.g., 60–100 bands) are necessary for effective band selection.
2.4.6. Montana EDA Insights
The Montana EDA demonstrates the following: (i) accurate GT construction and alignment are essential for reliable supervision, (ii) extreme class imbalance requires balanced sampling strategies, (iii) only the mid-VNIR range (bands 40–240) provides high-quality spectral information, (iv) short- and long-wavelength bands should be discarded due to sensor noise, and (v) the subtle, distributed spectral differences between forest classes motivate the use of larger band subsets and patch-based deep models.
These findings guided all subsequent preprocessing, band selection, and classification experiments.
2.5. Feature Selection (Band Selection)
To mitigate high dimensionality and spectral redundancy in hyperspectral images (HSIs), five band-selection techniques were employed: Principal Component Analysis (PCA), Spatial–Spectral Edge Preservation (SSEP), Spectral-Redundancy Penalized Attention Ranking (SRPA), and Deep Reinforcement Learning (DRL). Each method processes a pre-cleaned HSI data cube (height × width × bands, stored as
.npy) and the corresponding ground-truth (GT) map (2D integer labels;
.npy) after noise removal and normalization. These methods include unsupervised (PCA) and label-guided or supervised (SSEP, SRPA, and DRL) band-selection strategies, producing ranked band indices, scores, and visualizations for subsequent classification tasks. For the Montana UAS dataset, the EDA-cleaned cube additionally includes robust percentile clipping prior to normalization, as described in
Section 2.4.
2.6. K-Means Clustering-Based Band Selection (KMCBS)
Clustering-based band selection represents one of the mainstream and widely adoptedapproaches in hyperspectral dimensionality reduction [
18,
19]. To provide a strong unsupervised baseline for comparison with the proposed and learning-based methods, we include KMCBS. The KMCBS method clusters the
B spectral bands into
K groups using the K-Means algorithm [
48] and selects the band closest to each cluster centroid as the representative band for that group, yielding exactly
K selected bands with minimal intra-subset redundancy.
Formally, let each spectral band
be treated as a data point after normalization. K-Means partitions the
B bands into
K clusters
by minimizing intra-cluster variance. The representative band for cluster
is
where
is the normalized band vector and
is the cluster centroid. For large-scale datasets such as the Montana UAV VNIR cube, pixel subsampling of 50,000 pixels is applied prior to clustering to manage memory requirements.
2.7. Principal Component Analysis (PCA)-Based Band Selection
PCA is a classical unsupervised technique for variance-based band prioritization. We fit PCA on spectra from labeled pixels (GT > 0) to focus variance estimation on class-relevant regions. Full PCA is fitted (
n_components =
B, where
B is the number of bands), yielding explained variance ratios (EVR) and component loadings. It is worth noting that, strictly speaking, PCA is a feature-extraction rather than a band-selection method, as it produces linearly transformed components that combine information across all original spectral bands rather than identifying a subset of original wavelengths. Consequently, PCA-selected components do not preserve the physical interpretability of individual spectral bands. PCA is included in this study as a classical variance-based dimensionality reduction baseline, providing a reference point against which supervised and learning-based band-selection strategies can be compared [
3].
The score for each band
b is computed as
We set
as a stable trade-off that captures dominant variance while avoiding noise-dominated components; sensitivity to M was empirically minor in pilot runs [
49].
2.8. Spatial–Spectral Edge Preservation (SSEP)
SSEP is a label-guided band ranking method that leverages spatial class boundaries derived from ground-truth annotations [
50]. A reference binary edge map is derived from the GT using the Sobel gradient operator on label transitions. For each band, light Gaussian smoothing (
) was applied to reduce noise, followed by Sobel edge computation, thresholding at the 95th percentile, and binarization. The alignment with the GT edge map is quantified using the Dice coefficient:
Bands are ranked in descending order of these scores, favoring those that maintain sharp spatial–spectral edges [
50]. The implementation of SSEP [
43] in this study follows the methodology described in Algorithm 1.
| Algorithm 1 SSEP Band Selection |
- Require:
HSI cube C (), GT map G () - Ensure:
Ranked band indices order (descending scores) - 1:
edge_GT ← binarize(Sobel(float(G))) ▹ Binary edges where gradient - 2:
scores ← zeros(B) - 3:
for to do - 4:
band_img ← - 5:
smoothed ← Gaussian_filter(band_img, ) - 6:
edge_band ← Sobel(smoothed) - 7:
thresh ← percentile(edge_band, 95) - 8:
edge_bin ← (edge_band > thresh) - 9:
scores[b] ← Dice(edge_bin, edge_GT) - 10:
end for - 11:
order ← argsort(scores)[::−1] - 12:
return order, scores
|
2.9. Spectral-Redundancy Penalized Attention Ranking (SRPA)
SRPA employs a supervised attention mechanism to balance band informativeness and diversity. Small
patches are extracted around labeled pixels (up to 4000 samples per dataset; when fewer labeled samples were available, as in the Montana UAS dataset, all labeled patches were used) and split into train/validation sets (80/20; stratified). A lightweight 3D CNN (Conv3D layers, max-pooling, global average pooling, and Squeeze-and-Excitation block) is trained for two epochs to obtain stable attention trends while keeping the band-ranking stage computationally lightweight. After training, the mean band-wise attention weights are inferred from the SE block. Redundancy is computed from the correlation matrix of flattened subsampled patches:
This term penalizes bands that are highly correlated with the remainder of the spectrum. Final scores are attention
redundancy (
). Bands are ranked in descending order, promoting discriminative yet non-redundant selections [
51]. The complete SRPA procedure, including patch extraction, network training, attention aggregation, and redundancy penalization, is summarized in Algorithm 2.
| Algorithm 2 SRPA Band Selection |
- Require:
HSI cube C (), GT map G, patch_size = 5, max_patches = 4000, - Ensure:
Ranked band indices order (descending scores) - 1:
Extract patches P () and labels y from labeled positions ( max_patches) - 2:
Split P, y → train/validation (80/20, stratified) - 3:
Initialize 3D CNN with SE attention block - 4:
Train model on train patches (2 epochs, cross-entropy loss, Adam) - 5:
attn ← mean(SE outputs over validation) ▹ shape (B,) - 6:
flatten(subsample(P)) ▹ () - 7:
corr ← corrcoef(X) - 8:
redundancy ← (sum(, axis = 1) − 1)/() - 9:
scores ← attn redundancy - 10:
order ← argsort(scores)[::−1] - 11:
return order, scores, attn, redundancy
|
2.10. Deep Reinforcement Learning (DRL)-Based Band Selection
To enable adaptive and dataset-agnostic band selection, we formulate spectral band selection as a sequential decision-making problem and solve it using Deep Reinforcement Learning. The objective is to incrementally construct compact band subsets that maximize downstream classification performance while discouraging redundant or unnecessarily large selections. The problem is modeled as a Markov Decision Process (MDP) and optimized using a Deep Q-Network (DQN), allowing the agent to explicitly reason wtih regard to long-term selection quality rather than relying on greedy or one-shot ranking strategies.
2.10.1. Markov Decision Process Formulation
Band selection is formulated as a finite-horizon MDP defined by the tuple , where denotes the state space, the action space, P the transition dynamics, R the reward function, and the discount factor.
At decision step
t, the state
encodes the current band selection context and is defined as
where
is the set of selected bands up to step
t,
B denotes the number of spectral bands after EDA-based cleaning (dataset-specific; e.g.,
for the Montana VNIR dataset),
is a binary mask indicating the selected bands, and
represents per-band redundancy statistics computed from the EDA-cleaned hyperspectral cube. This state representation captures selection history, relative subset size, and spectral redundancy.
The action space consists of selecting one unselected spectral band:
Actions corresponding to previously selected bands are masked to prevent reselection.
Transitions are deterministic. Executing action updates the selected set as , yielding the next state . An episode terminates when a predefined band budget K is reached.
The reward function balances downstream classification performance and subset compactness. After selecting
K bands, a lightweight classifier is trained using only the selected subset, and the terminal reward is defined as
where
is the validation accuracy of a lightweight Random Forest classifier trained on the selected band subset
. The band budget
K is controlled externally via the Top-
K parameter, so no compactness penalty is required.
2.10.2. Deep Q-Network Architecture
The action-value function is approximated using a fully connected neural network that maps the state representation to Q-values over candidate actions. The network consists of multiple dense layers with ReLU activations. A target network and an experience replay buffer are employed to stabilize training and mitigate overestimation bias.
2.10.3. Training Procedure
At each episode, the agent sequentially selects spectral bands until the budget
K is reached. The terminal reward is computed using validation accuracy, and transitions
are stored in the replay buffer. The DQN is trained by minimizing the temporal-difference loss:
An -greedy strategy is used to balance exploration and exploitation during training.
2.10.4. Dataset-Specific Handling
The DRL formulation is dataset-agnostic. The value of
B and redundancy statistics are derived from the EDA-cleaned hyperspectral cube for each dataset. For the Montana UAV dataset, EDA cleaning includes percentile clipping and the removal of low-quality VNIR bands, as described in
Section 2.4. No additional dataset-specific tuning is applied.
2.10.5. Evaluation and Stability
Due to the stochastic nature of reinforcement learning, each DRL experiment is repeated multiple times with different random seeds. Band rankings are obtained by averaging selection frequencies across runs, yielding stable and reproducible band importance estimates.
The DRL-based band-selection method enables the adaptive, sequential selection of informative spectral bands while explicitly accounting for subset compactness. By integrating EDA-derived statistics and downstream classification feedback into a unified MDP framework, the method provides a flexible alternative to classical and attention-based band-ranking strategies.
2.11. Classification Models
To evaluate the effectiveness of the selected hyperspectral bands, we employed a suite of classical machine learning classifiers and a deep learning model. These models were trained on the reduced-dimensionality data obtained from the band-selection methods (PCA, SSEP, SRPA, and DRL) as well as the full-band baseline. The classifiers include Random Forest (RF), K-Nearest Neighbors (KNN), Support Vector Machines (SVM), and a 3D Convolutional Neural Network (3D-CNN). These models were chosen for their proven efficacy in hyperspectral image classification tasks, balancing computational efficiency with performance in handling high-dimensional spectral–spatial data [
47,
52]. All models were implemented using scikit-learn for classical methods and PyTorch 2.0.0 for the deep model, with the hyperparameters tuned based on standard practices in the HSI literature [
53].
2.11.1. Random Forest
RF is an ensemble learning method that constructs multiple decision trees during training and outputs the class that is the mode of the classes from individual trees. In our implementation, we used the RandomForest Classifier from scikit-learn with 200 estimators, utilizing all available CPU cores (n_jobs = −1) for parallel processing, and a fixed random state of zero for reproducibility. RF is particularly effective for HSI classification due to its ability to handle multicollinearity in spectral bands, and its robustness to overfitting [
54]. The model was trained on flattened spectral features from labeled pixels or selected bands, without incorporating spatial information explicitly. This approach aligns with prior studies on hyperspectral data where RF serves as a strong baseline for pixel-wise classification [
55].
2.11.2. KNN
The KNN classifier assigns a class to a query point based on the majority vote of its K-Nearest Neighbors in the feature space. We implemented KNN using scikit-learn’s KNeighborsClassifier with k = 5, employing the default Euclidean distance metric. This value of k was selected based on empirical performance in HSI datasets, where smaller neighborhoods help capture local spectral similarities while avoiding excessive noise sensitivity [
56]. KNN is computationally simple and non-parametric, making it suitable for hyperspectral data with varying class distributions, as seen in post-fire fuel classification scenarios [
57]. Like RF, it operates on pixel-based spectral features, leveraging the reduced band subsets to mitigate the curse of dimensionality.
2.11.3. SVM
SVM aims to find the optimal hyperplane that separates classes in a high-dimensional space, maximizing the margin between support vectors. Our SVM implementation used scikit-learn’s SVC with a radial basis function (RBF) kernel, regularization parameter C = 10, and gamma set to ‘scale’ (automatically computed as
. This configuration is commonly used in HSI classification to handle non-linear separability in spectral data [
58]. SVM’s strength lies in its effectiveness with small sample sizes and high-dimensional inputs, which is relevant for our prescribed-burn dataset where labeled pixels may be sparse [
59]. The model was trained on the same pixel-wise features as RF and KNN, benefiting from band selection to reduce kernel computation overhead.
2.11.4. 3D-CNN
For all datasets, spatial–spectral patches of a fixed size were extracted from the EDA-cleaned hyperspectral cube prior to 3D-CNN training, ensuring consistent spatial context across band-selection methods. The 3D Convolutional Neural Network (3D-CNN) extends traditional CNNs by incorporating spectral depth as an additional dimension, enabling joint spatial–spectral feature extraction. Our model, implemented in PyTorch 2.0.0, consists of two 3D convolutional layers followed by batch normalization, ReLU activation, max pooling, adaptive average pooling, and a fully connected output layer. The input shape is (
batch_size, 1,
bands,
patch_size,
patch_size), where patches are centered on labeled pixels with a
patch_size of five (as defined in the dataset configurations). The first convolutional layer uses 16 filters with a kernel size of (3, 3, 7) and padding to preserve dimensions, while the second uses 32 filters with (3, 3, 5). Training was performed over 20 epochs with Adam optimization (learning rate 1 ×
), cross-entropy loss, and a batch size of 32, monitoring validation accuracy to select the best model [
60]. This architecture is tailored for HSI, capturing volumetric patterns in post-fire environments, and has shown superior performance over 2D CNNs in spectral–spatial tasks [
61].
2.12. Class Imbalance-Handling Strategies
The Montana UAV VNIR dataset exhibits a severe class imbalance, with Dead Ponderosa Pine comprising only 30 labeled pixels. Three strategies are evaluated to address this. Class-weighted loss assigns inverse-frequency weights to each class during training, penalizing the misclassification of minority classes more heavily. For Random Forest, this is implemented via
class_weight=‘balanced’, and for 3D-CNN via a weighted cross-entropy loss function. Focal loss [
62] modifies standard cross-entropy by down-weighting easy examples and focusing learning on hard minority samples, using a focusing parameter
. SMOTE (Synthetic Minority Oversampling Technique) [
63] generates synthetic training samples for minority classes by interpolating between existing samples in feature space, applied to pixel-level features for Random Forest classification.
2.13. Evaluation Metrics
The model performance was assessed using a comprehensive set of metrics to ensure balanced evaluation across imbalanced classes typical in HSI datasets. Overall accuracy (OA) measures the percentage of correctly classified pixels. Cohen’s Kappa coefficient accounts for chance agreement, providing a more robust measure of inter-class reliability [
64]. Macro-averaged Precision, Recall, and F1-score were computed to evaluate performance across all classes equally, handling class imbalances in prescribed-burn scenarios where certain fuel types (e.g., char or recovering vegetation) may be underrepresented [
65]. These metrics were calculated in percentage form for consistency with OA, using scikit-learn’s
precision_score,
recall_score, and
f1_score functions with ‘macro’ averaging and
zero_division = 0 to manage undefined cases. Additionally, confusion matrices were generated to visualize misclassifications. For deep models, the best validation accuracy during training was also logged as an indicator of convergence [
66]. All metrics were derived from a 70/30 train–validation split, stratified by class to maintain distribution. All classifier hyperparameters were fixed to commonly used values from prior hyperspectral classification literature, and were kept identical across all band-selection methods to ensure a fair comparative evaluation. All the reported metrics were computed exclusively on held-out validation or test data and were not used during training or band selection stages.
2.14. Experimental Setup
Experiments were conducted on a unified pipeline implemented in Python 3.12, utilizing NumPy for data handling, scikit-learn for classical models, and PyTorch for deep architectures. Datasets included benchmark HSI scenes (Indian Pines, Pavia University, Salinas, Botswana, and KSC) and a custom VNIR hyperspectral dataset from prescribed burns at Lubrecht Experimental Forest, Montana, USA. Data preprocessing involved loading hyperspectral cubes and ground-truth maps as NumPy arrays, extracting labeled pixels or center patches (
patch_size = 5 for patch-based models), and applying a 70/30 train–validation split with stratification (
random_state = 0). For band selection, methods (PCA, SSEP, SRPA, and DRL) were applied to generate Top-K band subsets (k in [5, 10, 15, 20, 25, 30, 35, 40, 45, 50]), with the results compared against a full-band baseline. Training was performed on an NVIDIA GPU-enabled system where available, with CPU fallback for classical models. The results were logged to a CSV file tracking dataset, method, Top-K, model, and metrics. Cross-dataset evaluation ensured generalizability, particularly testing Montana data for real-world fire monitoring applicability [
67]. The setup emphasizes reproducibility, with all paths and hyperparameters defined in a central configuration file. All experiments were repeated over multiple random seeds, and the results were averaged to reduce variance.
In addition to dataset-specific evaluations, we report a cross-dataset summary in which, for each band selection technique, the best-performing configuration is selected based on aggregated performance across all datasets. This analysis is intended to highlight general performance trends and preprocessing sensitivity rather than dataset-specific accuracy.
3. Results
3.1. Result Aggregation and Best-Configuration Identification
For each dataset, all evaluated configurations—defined by the combination of the band-selection technique (No Band Selection, PCA, SSEP, SRPA, and DRL), the number of selected bands (Top-K), and classifier (RF, SVM, KNN, and 3DCNN)—were executed using five independent runs with different random seeds. The performance is reported as the mean ± standard deviation for overall accuracy (OA), Cohen’s Kappa, and the macro-averaged F1 score.
This exhaustive evaluation produced approximately 400–415 configuration-level results per dataset. The complete per-configuration results for all datasets, band-selection methods, Top-K values, and classifiers—including all five independent runs—are provided as supplementary repository, available in the project GitHub repository at
https://github.com/BMW-lab-MSU/hyperspectral-feature-selection-prescribed-fires/ (accessed on 20 December 2025). The supplementary repository include: (i) the raw result CSV files for all ten experimental runs (five EDA; five No-EDA) across all six datasets; (ii) the per-band selection score CSV files for all four methods under both EDA and No-EDA conditions for all datasets (40 files total); and (iii) an interactive HTML summary of
Table 2,
Table 3,
Table 4,
Table 5,
Table 6,
Table 7 and
Table 8.
The best configurations were identified using a hierarchical selection criterion that prioritizes the (i) highest mean OA, (ii) highest mean Kappa, (iii) highest mean macro-F1, and (iv) lower OA standard deviation. Note that the best configurations are selected independently for EDA and No-EDA conditions; consequently, the optimal Top-
K value may differ between the two settings for the same method, as each preprocessing pathway produces a distinct spectral input. The resulting best-per-technique summaries are reported in
Table 2,
Table 3,
Table 4,
Table 5,
Table 6,
Table 7 and
Table 8.
3.2. Dataset-Specific Results
3.2.1. Indian Pines (Table 2)
On Indian Pines, DRL achieves the highest OA under both EDA (%) and No-EDA (%) conditions, using compact Top-10 and Top-5 band subsets respectively with a 3D-CNN classifier. The no-band selection baseline ranks second under EDA (87.97%), while SRPA and PCA occupy intermediate positions. SSEP yields the lowest OA under EDA (72.26%), notably below PCA (77.59%), reflecting its sensitivity to spectral noise in this spectrally complex scene.
The columns reveal that EDA provides a minimal or slightly negative impact on OA for most methods on this dataset. DRL (), SRPA (), and PCA () all show marginally lower performance under EDA relative to No-EDA, while SSEP shows a small positive gain (%). These results suggest that on Indian Pines, EDA preprocessing does not confer a consistent OA advantage, though performance differences remain modest. DRL consistently achieves the highest macro-F1 (90.28% with EDA), indicating strong per-class discrimination.
3.2.2. Pavia University (Table 3)
Pavia University exhibits strong overall performance across all band-selection methods. DRL achieves the highest OA under both conditions (EDA: %; No-EDA: %), with compact Top-15 band subsets and a 3D-CNN classifier. SRPA ranks second under EDA (97.68%), followed closely by PCA (96.77%) and SSEP (96.52%), all using Top-30 3D-CNN configurations. The no-band selection baseline (94.00%) is substantially outperformed by all band-selection methods, highlighting the benefit of dimensionality reduction in this densely annotated urban scene.
EDA provides a moderate positive gain for SRPA () and SSEP (), while DRL and PCA show marginally lower OA under EDA ( and respectively). The consistently high Kappa values (>0.95 for DRL, SRPA, and PCA under EDA) confirm strong inter-class agreement beyond chance, with low standard deviations indicating stable performance across runs.
3.2.3. Salinas (Table 4)
Salinas achieves a uniformly high OA across all methods, reflecting its relatively favourable class separability. DRL produces the highest EDA OA (%) using a Top-50 RF configuration, with the no-band selection baseline ranking second (94.98%). SRPA (94.19%) and PCA (94.10%) follow closely, while SSEP yields the lowest OA (93.22%) under EDA. The narrow spread across methods (less than 2% range) indicates that Salinas is a relatively easy dataset for all evaluated strategies.
The values are uniformly small (within %), indicating that EDA preprocessing has negligible impact on the best-achievable performance for this dataset. This behavior is consistent with Salinas having comparatively clean spectral characteristics and limited noisy band contamination, reducing the marginal benefit of explicit noise removal and normalization.
3.2.4. Botswana (Table 5)
Botswana demonstrates strong classification performance across all techniques. DRL achieves the highest OA under EDA (%) using a Top-10 3D-CNN configuration, representing a notable margin over the next best method, the no-band selection baseline (92.10%). SRPA, PCA, and SSEP are closely clustered between 90–91%, with classical SVM classifiers dominating for these methods.
The values for most methods are close to zero (within %), indicating that EDA has minimal impact on the best-achievable performance for Botswana. This behavior is consistent with the dataset’s relatively clean spectral signatures and strong class separability. DRL shows a positive EDA gain (%), suggesting that noise normalization modestly benefits the reinforcement learning agent’s band selection on this scene. SRPA shows a negative (%), attributable to run-to-run variance (std 4.31% under EDA vs. 2.93% without).
3.2.5. Kennedy Space Center (Table 6)
For the KSC dataset, DRL achieves the highest OA under EDA (%) using a Top-10 3D-CNN configuration, followed by SRPA (%) and the no-band selection baseline (92.20%). PCA (88.30%) and SSEP (85.81%) trail substantially, with SSEP yielding the lowest OA on this dataset.
The values are predominantly negative across methods, indicating that EDA does not improve and in some cases slightly reduces the best-achievable performance on KSC. The most notable case is PCA, which achieves a higher OA under No-EDA using a 3D-CNN configuration (%) than under EDA with RF (%), a difference attributable to the change in the optimal classifier rather than EDA alone. DRL shows a small positive (%), the only method to benefit from EDA on this dataset. These mixed results indicate that the impact of EDA on KSC is technique-dependent rather than uniformly beneficial.
3.2.6. Montana UAV VNIR (Table 7)
The Montana UAV VNIR dataset exhibits fundamentally different behavior compared to the benchmark datasets. All methods achieve substantially lower OA (51–58% under EDA versus 87–99% on benchmarks), and the standard deviations are markedly higher (up to %), reflecting the reduced stability across runs due to limited labeled samples, class imbalance, and increased spectral noise from UAS acquisition conditions.
Under EDA, DRL achieves the highest OA (%) using Top-15 3D-CNN, followed by SRPA (%), SSEP (%), and PCA (%). However, these margins are small relative to standard deviations and should be interpreted with caution. Unlike the benchmark datasets, EDA provides positive OA for four of five methods on Montana (DRL: %, SRPA: %, PCA: %, and SSEP: %), suggesting that noise removal and normalization are beneficial for this noisy real-world dataset. However, the high variability across runs indicates that these gains are not statistically robust, and EDA alone cannot compensate for insufficient labeled data or highly variable spectral distributions. Macro-F1 scores remain low (38–52% under EDA), confirming difficulty in discriminating among the five closely related forest species and condition classes.
3.3. Cross-Dataset Trends (Table 8)
Table 8 summarizes performance aggregated across all six datasets by averaging the per-dataset best-configuration mean OA, Kappa, and macro-F1 for each method. Although absolute performance differs substantially across datasets (reflected in the large standard deviations of ±15–17%), consistent method-level trends are apparent.
DRL achieves the highest aggregated OA under both EDA (%) and No-EDA (%) conditions, confirming its robustness across diverse hyperspectral scenes. SRPA and the no-band selection baseline perform similarly (≈85%) and rank second under EDA and No-EDA respectively, while PCA (≈84%) and SSEP (≈82%) trail. The large cross-dataset standard deviations reflect the substantial performance gap between the Montana UAV dataset (≈52–58%) and the benchmark datasets (≈87–99%), rather than high method instability within any single dataset.
Comparing EDA and No-EDA aggregated results, the differences are small for all methods (within 0.6% OA), suggesting that while EDA is beneficial for specific datasets and methods, its effect on aggregated cross-dataset performance is modest. DRL shows the largest positive EDA contribution (% aggregated OA), while EDA provides a marginally positive advantage for SSEP (% aggregated OA). These observations support the conclusion that EDA is a safe and generally non-degrading preprocessing step, with dataset-dependent rather than universal benefits.
3.4. Summary of Results
Across
Table 2,
Table 3,
Table 4,
Table 5,
Table 6,
Table 7 and
Table 8, several consistent findings emerge. First, DRL-based band selection achieves the highest OA on all six datasets under at least one preprocessing condition, confirming its effectiveness for task-driven spectral subset selection. Second, SRPA consistently ranks second on three of six datasets under EDA (Pavia University, KSC, and Montana), while the no-band selection baseline ranks second on Indian Pines, Salinas, and Botswana, demonstrating strong and stable performance as an attention-guided alternative. Third, SSEP and PCA yield lower and less stable performance, with SSEP in particular yielding the lowest OA on Indian Pines and KSC, and PCA performing poorly on Indian Pines relative to even the baseline. Fourth, the impact of EDA is strongly dataset-dependent: it provides consistent benefits on Pavia University and Montana, minimal impact on Salinas and Botswana, and mixed or negative effects on Indian Pines and KSC. Finally, the Montana dataset consistently underperforms relative to all benchmarks, with higher variance and a lower macro-F1, underscoring the challenges of real-world UAV hyperspectral classification under limited supervision.
3.5. Selected Wavelength Analysis for the Montana UAV Dataset
To address the spectral interpretability of the evaluated band-selection methods,
Table 9 reports the wavelength ranges and dominant spectral regions selected by each method at its best-performing Top-
K configuration for the Montana UAV VNIR dataset.
3.5.1. EDA Condition
Under EDA preprocessing, the four methods exhibit markedly different spectral selection strategies, reflecting their underlying algorithmic objectives. SSEP selects almost exclusively NIR bands (764–932 nm; 18 of 20 bands), consistent with its edge-preservation criterion: in this forest scene, class boundaries between live conifers, dead biomass, and dead Ponderosa Pine are sharpest in the NIR plateau, where canopy structural differences are most pronounced [
29]. PCA concentrates on two clusters: Blue-Green (488–504 nm; six bands) and NIR (760–824 nm; twelve bands), reflecting variance maximization rather than class discriminability—high-variance wavelengths dominate regardless of their utility for species separation. DRL selects a more diverse set spanning Red (four bands; 636–696 nm), NIR (four bands; 801–916 nm), Green (five bands), and Red Edge (two bands; 707–742 nm), consistent with its task-driven optimization: by explicitly maximizing downstream classification accuracy, DRL identifies complementary spectral regions rather than concentrating on a single high-variance zone. SRPA produces the broadest selection (488–929 nm; 30 bands), distributing across NIR (ten), Green (6), Red (four), Red Edge (three), Blue (five), and Far NIR (two), reflecting its redundancy-penalization design which actively avoids correlated bands.
Across all four EDA methods, the NIR region of approximately 795–830 nm is consistently selected, representing the only wavelength range chosen by all four methods independently. This consensus band cluster aligns with the NIR reflectance plateau of vegetation, which is strongly modulated by leaf area index, canopy density, and cell structure—properties that differ substantially between live conifers (Douglas Fir, Ponderosa Pine, and Western Larch) and dead biomass classes in this post-fire environment [
29].
3.5.2. No-EDA Condition
Under No-EDA conditions, the selection patterns shift considerably for some methods. DRL No-EDA (Top-10) retains a similar Red Edge and NIR focus (526–879 nm), suggesting that the most discriminative spectral structure for forest species is preserved even without preprocessing. SSEP No-EDA remains exclusively NIR (762–896 nm), confirming that the edge-based criterion consistently identifies NIR boundaries regardless of normalization. PCA No-EDA (Top-50) shifts heavily toward Blue wavelengths (400–430 nm; 23 bands), selecting the highest-variance region of the raw uncorrected cube—these short-wavelength bands exhibit large radiometric variance due to atmospheric scattering and sensor noise rather than vegetation signal, illustrating the limitation of variance-based selection on uncorrected data.
The contrast between PCA EDA (NIR-dominant) and PCA No-EDA (Blue-dominant) directly demonstrates the importance of EDA preprocessing for variance-based methods: without normalization, PCA is misled by noise-dominated spectral regions at the wavelength extremes.
3.6. Clustering-Based Band Selection (KMCBS)
Table 10 presents the classification performance of the K-Means Clustering-based band-selection method across all six datasets. The 3D-CNN classifier consistently achieves the highest OA across all datasets, confirming the trend observed for the other four band-selection methods in
Table 2,
Table 3,
Table 4,
Table 5,
Table 6 and
Table 7. On benchmark datasets, KMCBS achieves competitive performance:
OA on Salinas,
on Botswana, and
on KSC. These results are generally within 1–3% of the best-performing learning-based methods (DRL and SRPA), demonstrating that clustering-based selection provides a strong unsupervised baseline.
On Indian Pines and Pavia University, KMCBS achieves
and
OA respectively, trailing DRL by approximately 3% on both datasets. This gap is consistent with findings in the prior literature [
18,
19] suggesting that learning-based methods better capture non-linear spectral structure in heterogeneous scenes. On the Montana UAV VNIR dataset, KMCBS achieves
OA with 3D-CNN, comparable to DRL (
) and above the statistical classifiers, confirming that clustering-based selection retains discriminative spectral information even in challenging real-world conditions with class imbalance and sparse labels.
Across all datasets, RF achieves competitive OA relative to SVM and KNN, while KNN consistently underperforms—a trend consistent with the other band-selection methods. The optimal Top-K value varies by dataset and classifier, with smaller subsets (Top-5 to Top-15) performing best for 3D-CNN and larger subsets (Top-40 to Top-50) preferred by RF and SVM, reflecting the different spatial context exploitation of patch-based versus pixel-based classifiers.
Comparing KMCBS against the four band-selection strategies evaluated in
Table 2,
Table 3,
Table 4,
Table 5,
Table 6 and
Table 7, several notable patterns emerge. On Salinas, Botswana, and KSC, KMCBS achieves the highest OA among all methods, with
,
, and
respectively—marginally outperforming DRL by
,
, and
. This demonstrates that centroid-based clustering is sufficient to identify the most discriminative band subsets in scenes with moderate spectral complexity. On Indian Pines and Pavia University, DRL outperforms KMCBS by
and
OA respectively, reflecting the advantage of sequential learning-based selection in spectrally heterogeneous scenes where joint band interaction matters more than individual band representativeness. On the Montana UAV VNIR dataset, KMCBS (
) trails DRL (
) by
, which is the largest gap across all datasets—consistent with the noisy, imbalanced nature of the UAV acquisition where spectral variability is high and labeled samples are sparse. Overall, KMCBS ranks first among all methods on three of six datasets and remains within
OA of the best-performing method on all others, confirming that clustering-based band selection constitutes a competitive and computationally lightweight alternative to learning-based strategies, particularly on well-structured benchmark datasets.
3.7. Patch Size Sensitivity Analysis
To assess the sensitivity of the 3D-CNN classifier to spatial context scale, we evaluated patch sizes of
,
, and
using the DRL band selector at its best-performing Top-
K configuration on three representative datasets: Indian Pines, Pavia University, and Montana UAV VNIR. The results are reported as the mean ± standard deviation over five independent runs in
Table 11.
On Indian Pines, performance increases modestly across patch sizes ( OA), though the larger patch exhibits notably higher run-to-run variance (), suggesting that offers a more stable trade-off for this spectrally complex scene. On Pavia University, a consistent and stable improvement is observed with increasing patch size ( OA), reflecting the benefit of larger spatial context in this densely annotated urban scene. The Montana UAV VNIR dataset shows the strongest sensitivity to patch size: OA improves from () to (), with macro-F1 increasing from to , indicating that larger patches capture more meaningful spatial structure in fire-affected forest environments where spectral contrast between classes is subtle and distributed. These results confirm that patch size is a dataset-dependent hyperparameter: remains a competitive and stable default across benchmarks, while larger patches () provide consistent gains on spatially complex or real-world datasets such as Montana.
3.8. Ablation Study
To validate the contribution of core algorithmic components in SRPA and DRL, we conducted an ablation study comparing each method against a degraded variant with a key module removed. For SRPA, we compared the full method (
) against an attention-only variant with the redundancy penalty removed (
). For DRL, we compared sequential band selection against random band selection of the same Top-
K subset size, serving as a lower-bound baseline. All experiments used the best-performing classifier and Top-
K configuration from
Table 2,
Table 3,
Table 4,
Table 5,
Table 6 and
Table 7, EDA preprocessing, and five independent runs. The results are reported in
Table 12.
3.8.1. DRL Sequential Selection vs. Random Baseline
Across all six datasets, sequential DRL selection consistently outperforms random band selection, confirming that the policy network learns meaningful selection strategies beyond chance. Gains are most pronounced on Indian Pines ( OA), Botswana ( OA), and KSC ( OA). On Pavia University, DRL achieves OA compared to for random selection. On the Montana UAV VNIR dataset, DRL provides a consistent gain ( OA) even under limited supervision and class imbalance, confirming that learned sequential selection retains value in real-world noisy conditions.
3.8.2. SRPA Redundancy Penalty
The effect of the redundancy penalty () is dataset-dependent. On Pavia University and Botswana, the full SRPA method outperforms the attention-only variant by and OA respectively, confirming that redundancy penalization is beneficial when spectral bands exhibit strong inter-correlation. On KSC, the difference is negligible (), indicating near-equivalent performance. However, on Indian Pines, removing the penalty yields notably higher OA ( vs. ), and on Salinas and Montana the attention-only variant performs marginally better. These results suggest that the SE block attention scores are strongly discriminative on most datasets, and the redundancy penalty may over-regularize in scenes where inter-band correlation is lower or spectral structure is more complex. This finding motivates future investigation into adaptive selection strategies tailored to dataset-specific spectral correlation structure.
3.9. Hyperparameter Sensitivity Analysis
To address the rationality of fixed hyperparameter choices in SSEP and SRPA, we conducted a systematic sensitivity analysis varying the Gaussian smoothing parameter
in SSEP over {0.5, 1.0, 1.5, 2.0} and the redundancy penalty coefficient
in SRPA over {0.1, 0.2, 0.3, 0.5, 0.7}. For each configuration, the best-performing classifier and Top-
K setting from
Table 2,
Table 3,
Table 4,
Table 5,
Table 6 and
Table 7 were used under EDA preprocessing, with five independent runs per configuration. The results are reported in
Table 13 and
Table 14. As discussed in
Section 2.9, the DRL reward function was implemented without a compactness penalty term, as the band budget
K is controlled externally via the Top-
K parameter; accordingly, DRL hyperparameter sensitivity is not applicable.
3.9.1. SSEP Sensitivity
The optimal value is dataset-dependent. On Pavia University, Salinas, and Botswana, yields the highest OA (, , and respectively), while performs best on Indian Pines () and KSC (), and is marginally optimal on Montana (). The chosen default is near-optimal on KSC () and Montana (), but is suboptimal on Indian Pines (), Pavia University (), Salinas (), and Botswana (). These results indicate that is a reasonable dataset-agnostic default for low-complexity datasets but that larger smoothing values better preserve class-boundary structure in spectrally complex or densely annotated scenes. The high variance observed at on Pavia University () further confirms that aggressive smoothing destabilizes edge detection in high-resolution urban imagery.
3.9.2. SRPA Sensitivity
The redundancy penalty coefficient
shows stable behavior on three of six datasets: KSC and Montana both achieve their best OA at
(
), and Salinas is near-optimal (
). However,
is suboptimal on Indian Pines (
; best at
), Pavia University (
; best at
), and Botswana (
; best at
). The monotonically decreasing trend on Indian Pines suggests that stronger redundancy penalisation progressively degrades performance, consistent with the ablation study finding in
Section 3.7. In contrast, Pavia University benefits from stronger penalisation (
), reflecting higher inter-band correlation in this densely annotated urban scene.
3.9.3. Summary
Both
and
were selected as dataset-agnostic defaults consistent with the prior literature [
50,
51], and perform competitively on lower-complexity datasets. The sensitivity analysis reveals that these values are suboptimal for spectrally complex benchmark scenes, motivating future work on adaptive parameter selection strategies that account for dataset-specific spectral correlation structure and edge characteristics.
3.10. Class Imbalance Analysis
The Montana UAV VNIR dataset exhibits severe class imbalance, with Dead Ponderosa Pine comprising only 30 labeled pixels and Ponderosa Pine 212 pixels out of 761 total labeled samples. To assess the impact of imbalance-handling strategies on minority class recognition, we compare four approaches: baseline classification with no imbalance handling, class-weighted loss (RF and 3D-CNN), SMOTE synthetic oversampling (RF), and focal loss (3D-CNN) [
62]. All experiments use DRL Top-15 band selection with EDA preprocessing and five independent runs. Per-class F1 scores are reported alongside OA and Kappa to assess minority class recognition directly.
SMOTE oversampling could not be applied in this experiment due to insufficient training samples for the most minority class after the 70/30 stratified split—Dead Ponderosa Pine yields approximately 21 training samples, which, combined with label encoding producing a sixth background class with zero samples, caused SMOTE to fail consistently. This outcome itself reflects a fundamental limitation of the Montana dataset for synthetic oversampling approaches.
The results are presented in
Table 15. Overall accuracy remains stable across all strategies, ranging from
to
, confirming that imbalance handling does not substantially alter global classification performance on this dataset. The 3D-CNN classifier consistently outperforms RF across all strategies due to its ability to exploit spatial context through patch-based learning. Among 3D-CNN variants, focal loss achieves the highest OA (
) and macro F1 (
), with a marginal improvement in Dead Ponderosa Pine F1 (
) over the baseline (
). Class-weighted 3D-CNN slightly degrades Dead Ponderosa Pine F1 (
) relative to the baseline, suggesting that inverse-frequency weighting over-suppresses majority class features that are also diagnostic for minority class boundaries in this spectrally complex scene.
Notably, Ponderosa Pine achieves F1 across all strategies and all runs. With only 212 total pixels and high spectral similarity to Douglas Fir, this class cannot be reliably learned from the available labeled samples regardless of imbalance strategy. This finding underscores a fundamental data limitation rather than a methodological shortcoming, and highlights the need for additional field data collection targeting this species in future campaigns.
4. Discussion
The results presented in this study highlight several important insights regarding hyperspectral band selection, preprocessing strategies, and model behavior across both benchmark and real-world datasets. Rather than focusing solely on absolute performance, this discussion interprets the observed trends in the context of dataset characteristics, model complexity, and the role of exploratory data analysis (EDA).
4.1. Effectiveness of Learning-Based Band Selection
Across all six evaluated datasets, learning-based band-selection methods—particularly Deep Reinforcement Learning (DRL)—achieved the highest classification performance. This consistent dominance suggests that DRL is effective at jointly optimizing spectral relevance and downstream classification performance, especially when sufficient labeled data and relatively stable spectral signatures are available. Unlike classical techniques such as PCA, which prioritize variance preservation, DRL explicitly optimizes task-specific objectives, leading to improved class discrimination as reflected by higher macro-F1 and Kappa scores.
The attention-based SRPA method ranked second on three of six datasets under EDA (Pavia University, KSC, and Montana), while the no-band selection baseline ranked second on the remaining three (Indian Pines, Salinas, and Botswana), demonstrating a strong balance between performance and stability. Compared to DRL, SRPA appears less sensitive to dataset size and noise, which may explain its competitive behavior even in more challenging settings. These observations support the growing body of literature advocating task-aware and learning-based band-selection methods over purely statistical approaches.
4.2. Limitations of Variance-Based Selection
SSEP yielded the lowest OA on four of six datasets (Indian Pines, Salinas, Botswana, and KSC), while PCA consistently underperformed learning-based methods, confirming that variance maximization alone is insufficient for hyperspectral classification. High-variance bands do not necessarily correspond to class-discriminative features, particularly in datasets with class imbalance or overlapping spectral signatures. The consistently poor performance of PCA across both EDA and No-EDA settings reinforces the need for supervised or task-driven band-selection strategies in practical hyperspectral applications.
It should also be noted that PCA is, strictly speaking, a feature extraction method rather than a true band-selection method: its output components are linear combinations of all input bands and do not correspond to specific measurable wavelengths. This limits the physical interpretability of PCA-selected features in spectroscopic applications, where the identity of selected wavelengths carries domain-specific meaning—for example, the Red Edge (∼700–730 nm) for vegetation stress or the NIR plateau (∼750–900 nm) [
38] for canopy structure. In contrast, SSEP, SRPA, and DRL all select original spectral bands, preserving this interpretability and enabling the wavelength analysis presented in
Section 3.5. PCA’s inclusion here is therefore as a variance-based baseline rather than a competing band-selection strategy, and its consistently lower classification performance relative to the supervised methods further underscores the insufficiency of variance maximization alone as a criterion for spectral subset selection in hyperspectral classification tasks.
4.3. Role of Exploratory Data Analysis
The impact of EDA varied across datasets and methods. On Pavia University, EDA provided consistent gains for SSEP (+1.96%) and SRPA (+0.73%), likely arising from improved feature normalization in this densely annotated urban scene. On Indian Pines, however, EDA effects were mixed or slightly negative for most methods (DRL: −0.46%; SRPA: −2.60%), suggesting that the spectral complexity of this scene does not benefit uniformly from the applied preprocessing. Notably, EDA provided the clearest and most consistent gains on the Montana UAV dataset, where noise removal and normalization improved OA for four of five methods (SRPA: +4.25%, DRL: +3.28%, PCA: +2.61%, and SSEP: +1.49%), though high run-to-run variance limits the statistical reliability of these gains.
In contrast, datasets with cleaner spectral characteristics, such as Botswana and Salinas, showed minimal performance differences between EDA and No-EDA. For most methods and datasets, EDA did not substantially degrade performance. The most notable exception is PCA on KSC, where EDA reduced OA by 5.23%, attributable to a change in the optimal classifier between conditions (No-EDA favors 3D-CNN at 93.53% versus EDA favoring RF at 88.30%) rather than a fundamental failure of preprocessing, indicating that it is generally a safe preprocessing step, though its benefits may be marginal when data quality is already high.
4.4. Challenges of Real-World UAV Hyperspectral Data
The Montana UAV VNIR dataset exhibited fundamentally different behavior compared to benchmark datasets. The overall classification performance was substantially lower, and learning-based methods did not consistently outperform simpler baseline or classical models. This divergence can be attributed to several real-world challenges, including limited labeled samples, increased spectral noise, class imbalance, and domain-specific variability introduced by UAV acquisition conditions.
In this setting, EDA provided numerical gains for four of five methods (SRPA: +4.25%, DRL: +3.28%, PCA: +2.61%, and SSEP: +1.49%); however, the high run-to-run variability (standard deviations up to ±5.73%) indicates these gains are not statistically robust, suggesting that preprocessing alone cannot compensate for insufficient supervision or highly variable data distributions. These findings underscore an important limitation of deep and reinforcement-based band-selection methods: while powerful, they remain sensitive to data quality and training signal strength.
4.5. Implications for Practical Deployment
Taken together, the results suggest that the choice of band-selection strategy should be guided by dataset characteristics rather than assumed universally optimal. Learning-based methods such as DRL and SRPA are highly effective on well-annotated benchmark datasets and structured scenes, but simpler methods may remain competitive in real-world scenarios with limited labels. The dataset-dependent impact of EDA further emphasizes the need for adaptive preprocessing pipelines rather than fixed workflows.
4.6. Clustering-Based Band Selection
The inclusion of KMCBS as a clustering-based baseline provides important context for interpreting the performance of the four primary band-selection strategies. The consistently small gap between KMCBS and DRL (1–
OA) suggests that spectral clustering captures a substantial portion of the discriminative information available in the selected band subsets. The advantage of learning-based methods such as DRL is most pronounced on spectrally heterogeneous datasets like Indian Pines, where the sequential selection policy learns to prioritise bands that jointly maximise classification accuracy rather than selecting individually representative bands. On simpler datasets such as Salinas and Botswana, the distinction between clustering- and learning-based selection diminishes, suggesting that spectral redundancy is the dominant factor in those scenes and that centroid-based selection is sufficient to capture it. These findings are consistent with prior comparative studies [
18,
19] which report competitive clustering performance on benchmark HSI datasets.
4.7. Class Imbalance in Real-World UAV Datasets
The class imbalance analysis on the Montana UAV VNIR dataset reveals that standard imbalance-handling strategies provide only marginal improvements in overall accuracy, though focal loss consistently achieves the highest macro F1 and Dead Ponderosa Pine F1 among all strategies evaluated. The persistent failure to classify Ponderosa Pine—with F1 across all strategies—highlights a fundamental limitation beyond algorithmic design: with only 212 labeled pixels and high spectral overlap with Douglas Fir, no imbalance strategy can compensate for insufficient training data. These findings reinforce the need for targeted field data collection in future prescribed fire campaigns and suggest that active learning or semi-supervised approaches may be more appropriate than post hoc resampling when labeled samples are critically scarce.
4.8. Contextualisation Against Prior Work
To situate our results within the broader band selection literature, we compare the OA achieved by our best-performing method on Indian Pines and Pavia University against closely related prior work that uses the same datasets and similar supervised or learning-based band-selection strategies. On Indian Pines, our DRL implementation achieves 89.23% OA (Top-10, 3D-CNN, and a 70/30 split). For reference, Cai et al. [
51] report
% OA (BS-Net-Conv, 25 bands, SVM, and 5% training) on the same dataset, while Mou et al. [
25] report
% OA (DRL, 30 bands, SVM-RBF, and 10% training). On Pavia University, our DRL achieves
% OA (Top-15; 3D-CNN). Cai et al. [
51] report 89.29% OA (BS-Net-FC, 15 bands, SVM, and 5% training) on this dataset, while Mou et al. [
25] report
% OA (DRL, 30 bands, SVM-RBF, and 10% training). Direct numerical comparison across these studies should be interpreted cautiously, as our 70/30 stratified split provides substantially more training supervision than the 10% per-class protocol used in most of the literature, which inflates absolute OA. The consistent method-level trend—DRL outperforming statistical and attention-based methods—holds across both protocols, supporting the generalizability of the findings. No prior band selection study has evaluated methods on a post-fire UAV VNIR dataset of the type introduced here; the Montana results therefore represent a novel contribution without a direct literature analogue.
4.9. Future Research Directions
Future work will focus on improving the robustness of learning-based band-selection methods for real-world UAV hyperspectral data. Promising directions include incorporating label-efficient or self-supervised learning strategies, leveraging spatial context more effectively, and designing hybrid approaches that combine the stability of classical methods with the adaptability of learning-based models. Extending the evaluation framework to additional sensing modalities and ecological monitoring tasks will further support the generalization of these findings.
5. Conclusions
This work presented a comprehensive and reproducible evaluation of hyperspectral band-selection techniques across six datasets, including widely used benchmark scenes and a real-world UAV-based VNIR dataset. Consistent with the objectives stated in the Abstract, we systematically compared classical, attention-based, and reinforcement learning-based band-selection methods under both EDA and No-EDA settings using a unified experimental protocol.
By aggregating the results over five independent runs and selecting representative configurations based on the mean performance and stability, we ensured a fair and transparent comparison across techniques. The results demonstrate that learning-based band-selection methods consistently outperform classical approaches on benchmark datasets. In particular, Deep Reinforcement Learning (DRL) achieved the highest overall accuracy on all six datasets under EDA conditions, and the highest or joint-highest on all six under No-EDA, confirming its robustness across diverse hyperspectral scenes. The attention-based SRPA method also showed strong and stable performance, frequently ranking second and consistently outperforming PCA-based selection.
The effect of exploratory data analysis (EDA) was found to be strongly dataset-dependent. EDA improved best-achievable performance most consistently on Pavia University, where SSEP and SRPA showed positive OA values (+1.96% and +0.73% respectively), and on Montana, where four of five methods benefited from noise removal and normalization. On Indian Pines, EDA effects were mixed, with DRL and SRPA showing small negative OA values (−0.46% and −2.60%), while on KSC, PCA showed a notably larger negative OA (−5.23%), attributable to a classifier change rather than preprocessing failure. On cleaner datasets such as Botswana and Salinas, the impact of EDA was marginal but non-degrading. These findings support the Abstract’s claim that preprocessing can enhance performance, but should not be assumed to be universally beneficial.
In contrast to benchmark datasets, the Montana UAV VNIR dataset exhibited substantially lower performance across all methods, with higher variability across runs. In this real-world setting, learning-based band-selection methods did not consistently outperform simpler baseline or classical approaches. EDA provided numerical OA gains for four of five methods on Montana (SRPA: +4.25%, DRL: +3.28%, PCA: +2.61%, and SSEP: +1.49%), but the high run-to-run variability across runs (standard deviations up to ±5.73%) indicates that these gains are not statistically robust, and preprocessing alone cannot compensate for insufficient labeled data or highly variable spectral distributions. This observation highlights the limitations of deep and reinforcement-based methods when applied to noisy data with limited labeled samples, and directly supports the Abstract’s emphasis on the challenges of real-world hyperspectral analysis.
Overall, this study confirms that learning-based band selection—particularly DRL—offers significant advantages for hyperspectral image classification when data quality and supervision are sufficient. At the same time, the observed variability across datasets underscores the importance of aligning band-selection strategies, preprocessing choices, and model complexity with dataset characteristics. Future work will focus on improving robustness for real-world UAV hyperspectral data, incorporating label-efficient and self-supervised learning strategies, and extending the proposed evaluation framework to additional sensing modalities and ecological applications. The inclusion of a clustering-based band selection baseline (KMCBS) demonstrated that K-Means centroid selection achieves competitive performance within 1– OA of the best learning-based methods, confirming the value of both classical and modern approaches to spectral dimensionality reduction. Class imbalance analysis on the Montana dataset demonstrated that focal loss provides the most consistent minority class improvements, though the critically limited labeled samples for Ponderosa Pine highlight the need for expanded field data collection in future prescribed fire monitoring campaigns.