Beverage Stain Classification Using Hyperspectral Imaging with an L-BFGS-B-Optimized Autoencoder and a Channel-Attention 1D CNN

Shit, Jitendra; Dar, Muzaffar Ahmad; V M, Manikandan; Roy, Partha Pratim

doi:10.3390/informatics13050068

Open AccessArticle

Beverage Stain Classification Using Hyperspectral Imaging with an L-BFGS-B-Optimized Autoencoder and a Channel-Attention 1D CNN

¹

Department of Computer Science and Engineering, SRM University-AP, Amaravati 522240, Andhra Pradesh, India

²

Centre for Interdisciplinary Research, SRM University-AP, Amaravati 522240, Andhra Pradesh, India

³

Department of Computer Science and Engineering, IIT (ISM), Dhanbad 826004, Jharkhand, India

^*

Author to whom correspondence should be addressed.

Informatics 2026, 13(5), 68; https://doi.org/10.3390/informatics13050068

Submission received: 2 February 2026 / Revised: 1 April 2026 / Accepted: 9 April 2026 / Published: 28 April 2026

(This article belongs to the Section Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

Hyperspectral imaging (HSI) provides rich spectral information and serves as a non-destructive technique for forensic stain analysis. Conventional approaches often exhibit degraded performance due to the high dimensionality and spectral redundancy inherent in hyperspectral data. To address this challenge, a hyperspectral dataset comprising nine beverage stains—papaya, coffee, pomegranate, orange, tea, wine, whisky, rum, and brandy—is developed. Building on this dataset, an ensemble framework that combines an optimized autoencoder (AE), channel-attention (CA)-enhanced one-dimensional convolutional neural networks (1D CNNs), and a Limited Memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS-B)-based weighted fusion strategy is proposed. The autoencoder learns compact latent representations from the 204-band hyperspectral vectors, reducing redundancy while preserving discriminative spectral features. CA emphasizes informative spectral bands and improves stain separability. Multiple 1D CNN models are trained using different latent dimensionalities, and their class probability outputs are fused through an optimized L-BFGS-B weighting scheme, where higher-performing models contribute more strongly to the final decision. Experimental results demonstrate classification accuracies of 96.54%, 97.19%, and 97.86% for the AE32 CA, AE64 CA, and AE128 CA models, respectively, with the optimized ensemble achieving an accuracy of 98.28%. Additionally, the time-dependent evolution of beverage stain reflectance is systematically analyzed using overlapped, normalized reflectance signatures acquired at time intervals of 0 min, 1 h, 2 h, 3 h, 4 h, and 5 h. The results confirm that AE-based latent compression, CA, and L-BFGS-B optimized ensemble fusion enhance hyperspectral beverage stain classification, providing an effective and extensible framework for forensic trace evidence analysis.

Keywords:

hyperspectral imaging; forensic stain classification; autoencoder; channel attention; ensemble learning; weighted feature fusion

1. Introduction

HSI has emerged as a strong analytical tool that can capture fine spectral patterns out of the reach of conventional Red–Green–Blue (RGB) or multispectral systems. By acquiring both spatial and spectral information for every pixel, HSI enables precise characterization of material properties. This has particular relevance to forensic investigations, as subtle chemical differences often play a crucial role in evidence interpretation. HSI has recently been promising in biomedical diagnostics [1], material inspection [2], food quality assessment [3], and geological mapping [4].

Beverage stains are a category of trace evidence that is seldom studied and are generally found on absorbent materials such as tissue paper, fabrics, and paper towels [5]. The chemical origin of such stains could provide investigative leads into victim activities, suspect behavior, or environmental conditions at a crime scene [6,7]. However, most conventional analytical techniques in studying stains involve destructive sampling, chemical testing, or manual interpretation, which can compromise repeatability and accuracy. These limitations motivate the use of nondestructive data-driven approaches that rely on the spectral richness of HSI. The application of beverage stain classification on especially porous substrates remains highly underexplored. Hyperspectral data are characterized by high dimensionality and strong spectral redundancy, which create significant challenges for efficient feature extraction and classification [8,9]. Most existing models rely on a single representation of features, making predictions prone to noise, substrate variations, or temporal changes in stain appearance. Furthermore, forensic analysis of beverage stains on porous substrates such as tissue paper and paper towels remains largely unexplored [10], despite their significant relevance to activity reconstruction, temporal analysis, and scene interactions. The spectral similarity among alcoholic beverages such as whisky, rum, and brandy, along with chemical variation over time, further complicates accurate classification.

The rest of this paper is structured as follows. In Section 2, some related work on hyperspectral image classification and forensics is introduced. In Section 3, the hyperspectral data acquisition system, construction of the dataset, preprocessing, and our proposed framework for AE-CA ensemble are introduced. The experimental results are provided in Section 4. Finally, in Section 5, a conclusion is drawn and possible directions for future work are provided.

2. Related Work

Deep learning (DL) has significantly transformed hyperspectral image analysis by enabling hierarchical feature learning directly from raw spectral data. Compared with conventional machine learning techniques, DL-based methods have demonstrated considerable improvements in hyperspectral classification accuracy, often ranging between 5% and 20% [11]. In particular, convolutional neural networks (CNNs) have been widely adopted for hyperspectral image analysis due to their ability to capture both spectral and spatial correlations. Advanced architectures such as Spectral Spatial Residual Networks (SSRNs) [12], HybridSN [13], and three-dimensional CNNs (3D-CNNs) [14,15] have significantly improved spectral–spatial representation learning. Deep residual spectral–spatial networks further enhance classification performance by preserving fine-grained spectral information across multiple network layers.

To further improve feature representation, attention mechanisms have recently been incorporated into hyperspectral deep learning models. Channel and spectral attention modules dynamically emphasize informative wavelength responses while suppressing redundant spectral correlations [16,17]. These mechanisms are particularly effective for hyperspectral data due to the strong inter-band correlation that exists across spectral channels. In addition to attention mechanisms, (AEs) have been widely used for dimensionality reduction and feature compression in hyperspectral datasets. AEs learn compact latent representations that preserve semantic spectral characteristics while significantly reducing data dimensionality. Recent studies have demonstrated that combining AEs with attention-enhanced CNN architectures can improve hyperspectral feature extraction and classification performance [18].

A significant issue in hyperspectral image processing is the scarcity of labeled training examples. Researchers have investigated different methodologies, including transfer learning, semi-supervised learning, and few-shot learning, to tackle this issue. Transfer learning methodologies utilize knowledge from source domains to improve performance in target domains with insufficient labeled data [19]. Semi-supervised learning methodologies, including self learning and pseudo-labeling, have been suggested to improve model robustness by leveraging both labeled and unlabeled data [20]. Few-shot learning techniques mitigate the issue of insufficient training samples by allowing models to generalize from a minimal quantity of labeled instances [21].

Spectral redundancy and spectral mixing remain fundamental challenges in hyperspectral data analysis. Spectral unmixing methods attempt to decompose mixed pixels into their constituent materials to improve classification performance in complex scenes [22,23]. In addition, sparse representation and low-rank modeling approaches have been introduced to remove redundant spectral information and enhance discriminative features [24,25]. Early survey studies reviewed the progress of hyperspectral image processing techniques, including noise removal, feature extraction, dimensionality reduction, and classification methods. These works highlight the importance of combining feature compression with physically meaningful spectral representations. Several research works have been reported on spectral redundancy reduction using compression and reconstruction techniques. For instance, compressive HSI methods have been suggested to minimize the spectral measurements needed during acquisition. The techniques use optimization methods to reconstruct the entire hyperspectral image [26]. Additionally, sparse reconstruction-based hyperspectral imaging approaches have also been suggested to retrieve spectral information from compressed observations. The techniques exploit sparsity constraints to optimize the imaging process [27]. The advantages of these techniques are that they minimize the storage requirements and acquisition time while maintaining spectral information through the use of the reconstruction technique. However, errors are introduced during the reconstruction process, and additional computational complexity is involved before classification.

On the other hand, the proposed framework is dedicated to enhancing the robustness of the classification performance instead of reconstructing the hyperspectral image. The autoencoder is applied for compressing the high-dimensional spectral signatures into compact feature representations while retaining the discriminating spectral information. Moreover, the (L-BFGS-B) algorithm is applied for learning the optimal fusion weights for fusing multiple AE-CA models within an ensemble learning framework. The major advantage of the proposed framework is that it can optimize the classification performance while maintaining efficiency through the limited memory optimization strategy. However, unlike compression-reconstruction-based methods, the proposed framework fails to reduce the storage cost of the raw hyperspectral image. Instead, it is dedicated to robust classification and enhancing the discrimination between similar beverage stains.

Ensemble learning has also been widely adopted to improve robustness and generalization in hyperspectral classification. Initial ensemble approaches utilized voting or stacking techniques for combining conventional classifiers, including support vector machines, decision trees, and K-nearest neighbors. More recent studies have demonstrated that ensembles of deep learning models can improve classification stability and robustness, particularly under domain shifts and limited training data conditions. In forensic science, hyperspectral imaging has been successfully applied to the detection and characterization of biological and trace evidence. Such applications may include blood stain detection, bio-fluid identification, document forgery analysis, and trace material analysis. However, forensic analysis of beverage stains on porous materials is still considered to be in its infancy. Spectral similarity is considered to be high in beverage stain analysis, especially in the case of alcoholic beverages such as whisky, rum, and brandy. The above-mentioned challenges emphasize the need to develop strong hyperspectral classification models.

Motivated by these challenges, the present study proposes an ensemble-based framework that integrates latent feature compression, channel-wise discriminative learning, and optimized model fusion for reliable hyperspectral beverage stain classification.

3. Methodology

This section describes the imaging configurations, dataset collection characteristics, acquisition procedure, preprocessing, and development of the proposed AE-CA ensemble. The overall methodology is given in Figure 1.

3.1. Hyperspectral Setup

The hyperspectral acquisition setup is depicted in Figure 2. To maintain a consistent field of view in all acquisitions, the Specim IQ hyperspectral camera (Specim, Spectral Imaging LTD., Oulu, Finland) was fixed at a height of 180 mm from the sample surface. For the acquisition of the reflectance images, broadband halogen lamps (Philips Lighting, Eindhoven, The Netherlands) were utilized as the source of illumination due to their ability to offer a continuous spectrum of radiation within the 400–1000 nm range, matching the sensitivity of the Specim IQ hyperspectral camera. Arrows in Figure 2 indicate the direction of illumination and the imaging geometry between the light sources, sample, and camera. Two broadband halogen lamps were placed at an angle of 45° relative to the sample surface. These were set at a distance of 150 mm from the sample surface. This geometric configuration of the acquisition system was effective in offering uniform illumination of the sample without significant shadow effects. The camera was connected to a workstation via a cable for the acquisition of images in real-time and reflectance calibration. All experiments were conducted under indoor conditions to avoid variations in ambient lighting conditions. The spectral emission parameters of the light source are detailed in the Appendix A Figure A6.

3.2. Dataset Description

A custom hyperspectral dataset was created with nine different beverage types: papaya, coffee, pomegranate, orange, tea, wine, whisky, rum, and brandy. The stains were created in controlled laboratory conditions to maintain consistency in the data acquisition process. Three different types of absorbent tissue substrates (commercially available tissue paper, local supplier, India) were used: pink, white, and brown. Each of the substrates was arranged in two different geometrical configurations: flat and folded. This created six different scenarios for data acquisition. Figure 3 shows samples of the substrates used in the experiments. Each figure in the image represents different scenarios: Figure 3a pink–flat, Figure 3b pink–folded, Figure 3c white–flat, Figure 3d white–folded, Figure 3e brown–flat, and Figure 3f brown–folded.

For the stains, 0.5 mL of each of the beverages was placed in the middle of the tissue substrate using a calibrated micropipette (Eppendorf, Hamburg, Germany). Owing to the porous nature of the substrate, the stains naturally spread to an approximate diameter of 15–20 mm, depending on the absorbency of the substrate and the viscosity of the beverage. There were 54 stain samples made, one for each type of drink and each type of substrate. Hyperspectral cubes were captured with a Specim IQ camera, which scans in the range of 400–1000 nm, or the visible to near-infrared range, with 204 contiguous bands. For each of the stains, six different drying intervals were captured at 0 min, 1 h, 2 h, 3 h, 4 h, and 5 h. This process allowed for the analysis of not only the drying effects of the stains but also the effects of the substrates. A white reference panel was captured prior to each data collection. Figure 4 shows that each pixel in the hyperspectral cube contains a full signature, which is 204 bands. Furthermore, the spectral data were also extracted based on the manually segmented region of interest, which defined the stained region. Spectral data for substrates like white, pink, and brown were segregated based on the different types of configurations, such as flat and folded, for the nine beverage classes within specific time intervals and stored in separate Comma Separated Value (CSV) files.

Spectral values corresponding to the nine beverage classes were extracted for the six time intervals, and the total CSV files generated for the substrates would be 6 (time intervals) × 9 (beverages) = 54 CSV files. However, the total number of CSV files generated for the six substrate and configuration types would be 54 × 6 = 324 CSV files.

3.3. Reflectance Calibration

In order to ensure spectral accuracy, all hyperspectral images were converted from raw digital numbers to reflectance values using the standard calibration procedure illustrated in Figure 5. Reflectance at each wavelength

λ

was calculated as

R (λ) = \frac{I_{sample} (λ) - I_{dark} (λ)}{I_{white} (λ) - I_{dark} (λ)}

(1)

where

I_{sample} (λ)

,

I_{white} (λ)

, and

I_{dark} (λ)

are the intensities recorded from the sample, white reference (spectralon panel), and dark reference (sensor offset), respectively. This calibration step corrects the effects of uneven illumination and sensor noise, yielding actual reflectance spectra for further analysis.

3.4. Data Preprocessing

Each cube was transformed into a two-dimensional spectral matrix wherein each pixel is defined by a spectral vector with 204 reflectance values. The regions of interest (ROI) were manually segmented to extract the stained regions and exclude background regions and specular reflection artifacts.

For conducting the temporal spectral analysis, the reflectance signatures were computed by aggregating ROI-based data instead of individual pixel data. In each case, the spectral response was computed by averaging the reflectance values of all the pixels within the segmented ROI. This process ensures that noise in individual pixel data is minimized and a more stable representation of the overall spectral response associated with the stained area is obtained. At this stage in ROI extraction, a unified dataset comprising a combination of all the spectral vectors and totaling 121,496 samples was obtained. Each sample in this dataset is associated with a class label from one of the nine classes of beverage types. Data cleaning operations were conducted in order to clean non-numeric and invalid data. Consequently, feature normalization was carried out through StandardScaler to maintain a mean of zero and unit variance in each spectral band. The dataset was subsequently partitioned into a training set and a testing set using an 80:20 ratio, with stratified sampling applied to all beverage categories. The dataset has been classified at the spectral sample spectral sample level instead of the hyperspectral image level. This indicates that spectral samples associated with the same physical stain may be included in both the training and testing sets. This technique facilitates the efficient utilization of high-dimensional spectral data, aligning with the pixel-wise learning paradigm commonly employed in hyperspectral image analysis.

3.5. Model Development

An integrated AE–CA architecture is developed by combining unsupervised latent feature compression and attention-enhanced spectral classification. In the subsequent subsections, the details about each component of the proposed model are presented.

3.5.1. Latent Dimension Selection

The choice of an appropriate latent dimensionality is critical to balance spectral compression and information preservation. With information-preserved compression, the model employs AE-based feature extraction. Under-compressed latent spaces may retain redundant or noisy wavelengths, although over-compressed spaces may discard the important spectral cue necessary for stain discrimination. Based on the empirical study, the following three representative latent sizes were chosen:

A total of 32 dimensions—provides strong compression, capturing coarse but robust spectral structure.
A total of 64 dimensions—achieves a balanced trade-off between compression and discriminative fidelity.
A total of 128 dimensions—retains the richest spectral detail and achieves the highest standalone accuracy.

3.5.2. Feature Compression via AE

To handle the high dimensionality and redundancy inherent in hyperspectral data, a fully connected AE is utilized for feature compression. Figure 6 depicts a model comprising two symmetrical elements: the encoder, which compresses the input spectra into a latent vector, and the decoder, which reconstructs the original input from the compressed representation.

The encoder progressively diminishes the input dimensionality through two hidden layers containing 128 and 64 neurons, respectively, each succeeded by a ReLU activation function to maintain non-linear spectral fluctuations. The compressed feature set after the bottleneck (latent space) layer maintains the most discriminative information required for classification. To study the impact of feature compactness, three different AE settings have been trained with latent dimensions set

d = {32, 64, 128}

to extract multiple levels of spectral abstraction. The decoder has a mirrored structure to the encoder for reconstructing the input spectra from the latent space. The Adam optimizer was employed to optimize the model with a learning rate of

10^{- 3}

, while the model was optimized on the MSE between the input and the reconstructed output. Early stopping with a patience of 10 epochs was applied to avoid overfitting and ensure convergence. Once the model is trained, the decoder is discarded, and only the encoder component is retained as a feature extractor.

3.5.3. CA Framework

The compressed latent features were then reshaped and fed into a 1D-CNN with an integrated CA mechanism. A CA block works by adaptively rebalancing the feature responses through emphasizing the spectrally more discriminative channels and suppressing the redundant or noisy ones. Such selective weighting enhances the focus on informative spectral regions corresponding to certain characteristics of beverages.

This architecture takes the form of two convolutional layers with 64 and 32 filters, both with kernel size = 3, each followed by non-linear activation, as shown in Figure 7. The CA block includes global average pooling, a two-layer dense network with a reduction ratio of 16, and an element-wise multiplication step for feature re-weighting. Resultant feature maps were flattened and fed to a fully connected dense layer of 64 neurons with ReLU activation; further, dropout regularization was used with a rate of 0.3. Finally, a softmax output layer yielded the class probabilities for the nine beverage stain categories. The suggested model was refined utilizing the Adam optimizer with a learning rate of

1 \times 10^{- 3}

and categorical cross-entropy loss. All models were trained for 100 epochs, with a batch size of 64 and a validation split of 0.1. Accordingly, early stopping has been employed to monitor validation loss. This framework processes encoded latent features through convolutional and attention blocks, followed by dense and softmax layers for beverage stain classification.

3.5.4. Ensemble Learning Framework

Although individual AE-CA models demonstrated competitive classification performance, each model captured different spectral–spatial characteristics depending on its latent dimensionality. To exploit this diversity, a positively weighted ensemble learning strategy was employed to integrate multiple AE-CA models that were trained with different sizes of latent features. Overall, the AE-CA pipeline is illustrated in Figure 1.

The framework comprising three AE-CA models with latent dimensions of 32, 64, and 128 was trained independently. Each model generated a class-probability vector for the nine beverage stain categories. Instead of using a straight equal-weight averaging scheme, an optimized weighted combination of these probability vectors was computed. This will enhance predictive robustness and reduce bias. The final ensemble prediction

\hat{y}

for a given test sample was obtained as follows:

\hat{y} = arg max_{c} (\sum_{i = 1}^{N} w_{i} P_{i} (c)),

(2)

where

P_{i} (c)

denotes the probability assigned to the class c by the i-th model,

w_{i}

is its corresponding non-negative fusion weight, and N is the number of AE-CA models.

To ensure interpretability and proportional contribution, the weights were constrained such that

\sum_{i = 1}^{N} w_{i} = 1

and

w_{i} \geq 0

. To estimate the optimal set of ensemble weights, a constrained numerical optimization L-BFGS-B procedure was performed by minimizing the negative validation accuracy:

min_{w} - Accuracy (w), s . t . \sum_{i = 1}^{N} w_{i} = 1, w_{i} \geq 0 .

(3)

where

Accuracy (w)

is the accuracy of the classification using the predictions of the weighted ensemble on the validation set. As the optimization problem is of a low dimension with only three ensemble weights, the L-BFGS-B method is observed to show robust convergence characteristics with standard convergence criteria. Hence, there is minimal effect of the convergence criteria on the optimization results. Also, the effectiveness of the optimization process is ensured with the introduction of bound constraints such that the weights are always positive and sum up to one. Unlike the conventional approach of using the average or a fixed weighting of the ensemble members, the optimization approach ensures the exploitation of the complementary latent features of the data using the L-BFGS-B optimization approach. Figure 8 summarizes how gradients are computed, how curvature information is approximated using limited memory, how bounds are automatically enforced, and how weights are normalized after each iteration. The asterisk(*) in the figure denotes the optimal weight vector obtained after convergence of the optimization process. A combination of various AE-CA model predictions refines their proposed ensemble achieved through optimized weight learning contributions. It exhibits improved stability and better generalization than single classifiers. The final ensemble made good use of complementary spectral representations, which helped reduce overfitting and raise the overall accuracy of classifying hyperspectral beverage stains.

3.6. Computational Environment

All tests were conducted in Python 3.10, utilizing Tenserflow and scikit-learn, on a workstation featuring an Intel Core i7 CPU, 16 GB RAM, and an NVIDIA RTX 4090 GPU (24 GB VRAM). This computational environment ensured consistent training dynamics for all AE-CA configurations.

4. Results and Discussion

This part discusses the experimental results and comprehensive analysis of the spectral signatures of the substrates, together with the proposed AE-CA ensemble framework for the categorization of hyperspectral beverage stains. The performance of the models was quantified using standard metrics of accuracy, precision, recall, and F1-score, facilitated for detailed class-wise analysis by the confusion matrix.

4.1. Temporal Spectral Analysis

The evolution over time of beverage stain reflectance was analyzed by overlapped normalized reflectance signatures of nine beverages tested at drying times of 0 min, 1 h, 2 h, 3 h, 4 h, and 5 h on white-flat tissue. A description of representative figures showing reflectance behavior is given below in Figure 9.

In the initial stage (0 min), the spectra of all types of drinks show comparatively smoother patterns with lower intensities of reflectance value because of the dominating effect of the moisture content. However, with progressive drying, there are consistent changes noticed in the values of spectral intensity and patterns because of evaporation and consequent concentration of the chemicals, as well as modifications of the surface and structure of the matrix of substances, which are more prominent in the near-infrared (NIR) region (700–1000 nm) and comparatively moderate in the visible region (400–700 nm).

Substances like wine, whisky, rum, and brandy share common spectral features for certain regions in the visible spectrum (roughly 450–700 nm) and could cause possible confusion between classes if analyzed alone. But distinct patterns for divergences occur in the NIR spectrum with increasing drying time, thus becoming more separable. For non-alcoholic drinks like coffee, tea, orange juice, papaya juice, and pomegranate juice, the spectral changes tend to be more sample-dependent, with clear changes in intensity and peaks with increasing time. The substrate configuration also has its own effect on the spectral behavior. The changes in the spectra are affected by the chemical composition of the beverage, such as the presence of pigments, phenolic compounds, and dissolved materials, which are responsible for the absorption of certain wavelengths of light. These chemical differences are responsible for the discriminative spectra obtained in the model. The folded tissue configuration has more spectral roughness and intensity variability compared to the flat configuration. This is because the folding causes uneven staining absorption and other geometrical changes. For instance, the folding causes uneven orientation, and this causes changes in the amount of reflected and incident light. These factors cause uneven reflection and variability in the spectrum of the stained area. Further, the baseline reflection is influenced by the color of the tissues. This is because different tissues cause shifts compared to white tissues. However, the temporal analysis shows that the time of drying, the color of the tissues, and the configuration are significant factors. These results confirm the need to include dynamic spectral properties in the proposed AE-CA ensemble model that may lead to a more robust differentiation of beverage stains in real-world applications. The temporal evolution of spectral signatures on various substrates is presented in the Appendix A.

4.2. Empirical Study on Latent Dimension

To identify an effective range of latent dimensions, an empirical evaluation was conducted over various bottleneck sizes, ranging from 16 to 128. Each AE was trained using the same settings, and the encoded features were subsequently classified. Figure 10 and Figure 11 summarize the test accuracies across all latent dimensions using both bar and line plots. The results show that accuracy steadily improves as latent dimensionality increases, stabilizing between 64 and 128. For latent dimensions below 32, the performance drops radically, while representations with fewer than 128 dimensions result in a sharp decline in performance due to excessive information loss; dimensions higher than 128 incur a higher computational cost.

4.3. Performance of Individual AE-CA Models

Three different AE-CA models were independently trained using latent dimensions of 32, 64, and 128. Early stopping was applied during training to prevent overfitting and ensure stable convergence. The models have achieved high test-set accuracies, demonstrating that AE-driven spectral compression followed by CA classification preserves discriminative hyperspectral information across multiple compression levels. Table 1 summarizes the classification performance of the configurations evaluated. The 32-dimensional model AE32-CA reached an accuracy of 96.54%, the 64-dimensional model AE64-CA reached 97.19%, and the 128-dimensional model AE128-CA produced the stand-alone accuracy of 97.86%. These findings therefore suggest that increasing the latent dimension enhances discriminative capacity within the explored range, although larger latent spaces naturally introduce extra computational cost.

4.4. Performance of the Ensemble Classifier

For improving the stability and robustness of prediction, an accuracy-based weighted ensemble was designed based on the class probability outputs of the three AE-CA models, which were trained with latent dimensions of 32, 64, and 128. The ensemble weights were determined by normalizing the test-set accuracies of the individual classifiers as follows:

w = [0.3311, 0.3333, 0.3356],

This indicates that AE128-CA contributes a bit more to the final decision because of its better standalone performance. The resulting ensemble achieved an overall accuracy of 98.28%, outperforming all individual models, which demonstrates the benefit of aggregating complementary latent features. In order to further verify the effectiveness of the proposed framework, a comparison was carried out with the most relevant existing work by Devassy & George in 2021 [28], which is the only work related to hyperspectral beverage stain classification. In their work, a convolutional autoencoder–SVM classifier was used, which obtained an accuracy of 94.40%. The results are provided in Table 2.

As observed, the proposed AE-CA ensemble method performs better than the existing method, considering the increased variability of the substrates (white, pink, and brown colors), configurations (flat and folded states), and temporal evolution (ranging from 0 min to 5 h).

The confusion matrix in Figure 12 has strong diagonal dominance, confirming highly reliable stain discrimination across all nine beverage classes. The misclassifications were few and involved only classes that are visually as well as spectrally similar, such as whisky–rum and rum–brandy, whose reflectance curves show partial overlap in the 450–700 nm region. These findings are further supported by the per-class precision, recall, and F1-scores (see Table 3), which remain consistently high across all categories. Papaya, pomegranate, wine, and tea have the most stable behavior, while coffee, rum, and brandy reflect somewhat reduced scores, a trend consistent with the spectral similarities observed in the analysis of raw reflectance signatures. Overall, the ensemble classifier presents a strong generalization capability, effectively handling inter-class spectral similarity while yielding balanced and robust performance on all the stain categories.

4.5. Ablation Study

To further confirm the role of each part, we conducted an ablation study on the proposed framework with the CA mechanism and the ensemble fusion technique. The ablation study results are provided in Table 4 and Table 5. From Table 4, we can see that when we remove the CA mechanism, there is a corresponding reduction in classification accuracy for all latent dimensions, with a degradation in accuracy ranging from 0.88% to 1.11%. In addition, we also performed another ablation study on the proposed framework with respect to the L-BFGS-B-based ensemble fusion strategy and equal-weight averaging. The ablation study results are provided in Table 5. From this table, we can see that when we use the L-BFGS-B-based ensemble fusion strategy, we achieve better accuracy with a 0.36% improvement when compared to equal-weight averaging-based ensemble fusion in the AE-CA framework. The improvement is more significant when we compare with other non-attention-based models.

4.6. Training and Validation Analysis

The accuracy and loss curves of the AE-CA models with latent dimensions 32, 64, and 128 are illustrated in Figure 13 and Figure 14, respectively. The accuracy curves exhibit rapid improvement during the initial epochs, followed by smooth stabilization, with higher latent dimensions (64 and 128) achieving slightly better validation performance. This indicates that a richer latent space captures more discriminative spectral information. The loss curves exhibit a steady and monotonic decrease for both training and validation, with minimal divergence between them. This close alignment reflects strong generalization and limited overfitting, demonstrating that the AE effectively compresses the hyperspectral data while preserving class-relevant features. Overall, the consistent behavior across all latent dimensions highlights the stability of the AE-CA framework and confirms the robustness of the proposed ensemble learning strategy for forensic hyperspectral stain classification.

4.7. Training and Inference Time Analysis

To quantitatively evaluate the computational efficiency of our method, the time cost for training the autoencoder, encoding for both the training and test datasets, as well as the time cost for classification in each dimension, are measured and compared in Table 6. Although the time cost for training the autoencoder increases in a predictable manner with the increasing capacity of the autoencoder, the classification process indicates an interesting efficiency trend with the increase in dimension. It is interesting to note that the time cost for prediction in the AE64-CA model is slightly smaller compared to the AE32-CA model, even though the dimension is larger. This may be due to the better feature representation in the higher-dimensional space, which retains more discriminative spectral features. Hence, the feature extraction capability in the following 1D CNN with channel attention is also improved. Moreover, in recent deep learning libraries such as TensorFlow and PyTorch, due to hardware-level parallelism in tensor operations, the increase in dimension size in the feature space does not necessarily mean an increase in time cost. Instead, it may even lead to improved time cost due to hardware-level efficiency. This is further confirmed by the inference analysis, which shows that all the individual models take less than 2.2–2.4 s to run a full test inference, while the time taken by the ensemble fusion is negligible, taking only 0.0142 s. This shows that the ensemble classifier is not only effective in its inference but is also efficient, which is critical in a forensic setting where decisions have to be made quickly.

5. Conclusions

This paper introduced weighted ensemble learning using an L-BFGS-B optimization strategy for hyperspectral beverage stain categorization using AE-CA networks. Three AE-CA models with latent feature dimensions of 32, 64, and 128 were employed to capture complementary spectral–spatial representations. The proposed ensemble achieved an accuracy of 98.28%, outperforming the standalone AE-CA models and exhibiting strong robustness across all nine beverage-stain categories. Class-wise evaluation confirmed reliable discriminative performance, with minor confusion only among whisky, rum, and brandy, which are spectrally similar stains. Future work will focus on enhancing the practical applicability of the proposed framework. Firstly, integrating 3D hyperspectral modeling can more accurately represent the volumetric diffusion of beverage constituents within porous substrates. Secondly, this investigation will be extended to encompass more realistic forensic materials, including textiles, coated papers, and polymer surfaces, to improve cross-surface robustness. Furthermore, uncertainty-aware inference techniques, such as Bayesian or evidential learning, will be analyzed to support confident forensic conclusions. Finally, the development of lightweight and edge-deployable hyperspectral systems will enable rapid and practical on-site forensic inspections.

Author Contributions

Conceptualization, J.S. and M.V.M.; methodology, J.S.; software, J.S.; validation, J.S., M.A.D. and P.P.R.; formal analysis, J.S.; investigation, J.S.; resources, M.V.M. and M.A.D.; data curation, J.S.; writing—original draft preparation, J.S.; writing—review and editing, M.V.M., M.A.D. and P.P.R.; visualization, J.S.; supervision, M.V.M.; project administration, M.V.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by SRM University-AP, Amaravati 522240, Andhra Pradesh, India.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

The temporal evolution of beverage stains is analyzed using the overlaid spectral signatures of all nine beverages captured for six drying intervals (0 min, 1 h, 2 h, 3 h, 4 h, and 5 h) on various tissue substrates and configurations. Specifically, Figure A1 and Figure A2 show the spectral variation for pink tissue in folded and flat conditions, respectively, while Figure A3 and Figure A4 correspond to brown tissue in the folded and flat configurations. Finally, Figure A5 shows the spectral responses for white tissue in folded surface conditions.

Furthermore, the illumination characteristics of the hyperspectral imaging setup have been provided. The imaging system utilizes a broadband halogen light source. The light source has a continuous spectral emission profile in the visible to near-infrared range of 400–1000 nm, matching the spectral sensitivity of the Specim IQ camera. Although the exact model of the light source is unknown, its spectral emission profile follows the general profile of tungsten halogen light sources. Figure A6 of this paper contains a typical emission spectrum of the broadband halogen light source. The spectrum has a smooth profile over the entire range of operation.

Figure A1. Temporal spectral signatures (0–5 h) of beverage stains on pink tissue paper (folded configuration).

Figure A2. Temporal spectral signatures (0–5 h) of beverage stains on pink tissue paper (flat configuration).

Figure A3. Temporal spectral signatures (0–5 h) of beverage stains on brown tissue paper (folded configuration).

Figure A4. Temporal spectral signatures (0–5 h) of beverage stains on brown tissue paper (flat configuration).

Figure A5. Temporal spectral signatures (0–5 h) of beverage stains on white tissue paper (folded configuration).

Figure A6. Representative spectral emission profile of the tungsten–halogen light source used in the hyperspectral imaging setup.

References

Cui, R.; Yu, H.; Xu, T.; Xing, X.; Cao, X.; Yan, K.; Chen, J. Deep Learning in Medical Hyperspectral Images: A Review. Sensors 2022, 22, 9790. [Google Scholar] [CrossRef]
Xu, Y.; Du, B.; Zhang, L. Robust Self-Ensembling Network for Hyperspectral Image Classification. IEEE Trans. Neural Networks Learn. Syst. 2024, 35, 3780–3793. [Google Scholar] [CrossRef]
Yang, C.; Guo, Z.; Fernandes Barbin, D.; Dai, Z.; Watson, N.; Povey, M.; Zou, X. Hyperspectral Imaging and Deep Learning for Quality and Safety Inspection of Fruits and Vegetables: A Review. J. Agric. Food Chem. 2025, 73, 10019–10035. [Google Scholar] [CrossRef] [PubMed]
Wang, K.; Yong, B.; Gu, X.; Xiao, P.; Zhang, X. Spectral Similarity Measure Using Frequency Spectrum for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2015, 12, 130–134. [Google Scholar] [CrossRef]
Pallocci, M.; Treglia, M.; Passalacqua, P.; Luca, L.D.; Zanovello, C.; Mazzuca, D.; Guarna, F.; Gratteri, S.; Marsella, L.T. Forensic applications of hyperspectral imaging technique: A narrative review. Med.-Leg. J. 2022, 90, 216–220. [Google Scholar] [CrossRef] [PubMed]
Sharma, S.; Chophi, R.; Jossan, J.K.; Singh, R. Detection of bloodstains using attenuated total reflectance-Fourier transform infrared spectroscopy supported with PCA and PCA–LDA. Med. Sci. Law 2021, 61, 292–301. [Google Scholar] [CrossRef] [PubMed]
Mistek-Morabito, E.; Lednev, I. Discrimination between human and animal blood by attenuated total reflection Fourier transform-infrared spectroscopy. Commun. Chem. 2020, 3, 178. [Google Scholar] [CrossRef]
Bioucas-Dias, J.M.; Plaza, A.; Camps-Valls, G.; Scheunders, P.; Nasrabadi, N.M.; Chanussot, J. Hyperspectral remote sensing data analysis and future challenges. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–36. [Google Scholar] [CrossRef]
Ghamisi, P.; Yokoya, N.; Li, J.; Liao, W.; Liu, S.; Plaza, J.; Rasti, B.; Plaza, A. Advances in Hyperspectral Image and Signal Processing: A Comprehensive Overview of the State of the Art. IEEE Geosci. Remote Sens. Mag. 2017, 5, 37–78. [Google Scholar] [CrossRef]
Jaiswal, G.; Sharma, A.; Kumar Yadav, S. DFD-SS: Document Forgery Detection using Spectral – Spatial Features for Hyperspectral Images. J. Vis. Commun. Image Represent. 2022, 89, 103690. [Google Scholar] [CrossRef]
Plaza, A.; Benediktsson, J.A.; Boardman, J.W.; Brazile, J.; Bruzzone, L.; Camps-Valls, G.; Chanussot, J.; Fauvel, M.; Gamba, P.; Gualtieri, A.; et al. Recent advances in techniques for hyperspectral image processing. Remote Sens. Environ. 2009, 113, S110–S122. [Google Scholar] [CrossRef]
Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral–spatial residual network for hyperspectral image classification: A 3-D deep learning framework. IEEE Trans. Geosci. Remote Sens. 2018, 56, 847–858. [Google Scholar] [CrossRef]
Roy, S.K.; Krishna, G.; Dubey, S.R.; Chaudhuri, B.B. HybridSN: Exploring 3-D–2-D CNN Feature Hierarchy for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2020, 17, 277–281. [Google Scholar] [CrossRef]
Li, Y.; Zhang, H.; Shen, Q. Spectral–Spatial Classification of Hyperspectral Imagery with 3D Convolutional Neural Network. Remote Sens. 2017, 9, 67. [Google Scholar] [CrossRef]
Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep Learning-Based Classification of Hyperspectral Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2094–2107. [Google Scholar] [CrossRef]
Zhu, M.; Jiao, L.; Liu, F.; Yang, S.; Wang, J. Residual Spectral–Spatial Attention Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 449–462. [Google Scholar] [CrossRef]
Huang, W.; Zhao, Z.; Sun, L.; Ju, M. Dual-Branch Attention-Assisted CNN for Hyperspectral Image Classification. Remote Sens. 2022, 14, 6158. [Google Scholar] [CrossRef]
Xu, L.; Lu, C.; Zhou, T.; Wu, J.; Feng, H. A 3D-2DCNN-CA approach for enhanced classification of hickory tree species using UAV-based hyperspectral imaging. Microchem. J. 2024, 199, 109981. [Google Scholar] [CrossRef]
Tuia, D.; Persello, C.; Bruzzone, L. Domain Adaptation for the Classification of Remote Sensing Data: An Overview of Recent Advances. IEEE Geosci. Remote Sens. Mag. 2016, 4, 41–57. [Google Scholar] [CrossRef]
Liu, B.; Zuo, X.; Yu, A.; Sun, Y.; Wang, R. Semi-supervised classification of hyperspectral images based on multi-view consistency. Remote Sens. Lett. 2023, 14, 479–490. [Google Scholar] [CrossRef]
Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a Few Examples: A Survey on Few-shot Learning. ACM Comput. Surv. 2020, 53, 63. [Google Scholar] [CrossRef]
Keshava, N.; Mustard, J. Spectral unmixing. IEEE Signal Process. Mag. 2002, 19, 44–57. [Google Scholar] [CrossRef]
Bioucas-Dias, J.M.; Plaza, A.; Dobigeon, N.; Parente, M.; Du, Q.; Gader, P.; Chanussot, J. Hyperspectral Unmixing Overview: Geometrical, Statistical, and Sparse Regression-Based Approaches. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012, 5, 354–379. [Google Scholar] [CrossRef]
Zhang, H.; He, W.; Zhang, L.; Shen, H.; Yuan, Q. Hyperspectral Image Restoration Using Low-Rank Matrix Recovery. IEEE Trans. Geosci. Remote Sens. 2014, 52, 4729–4743. [Google Scholar] [CrossRef]
He, W.; Zhang, H.; Zhang, L.; Shen, H. Total-Variation-Regularized Low-Rank Matrix Factorization for Hyperspectral Image Restoration. IEEE Trans. Geosci. Remote Sens. 2016, 54, 178–188. [Google Scholar] [CrossRef]
Fan, A.; Xu, T.; Li, J.; Teng, G.; Wang, X.; Zhang, Y.; Xu, C. Compressive full-Stokes polarization and flexible hyperspectral imaging with efficient reconstruction. Opt. Lasers Eng. 2023, 160, 107256. [Google Scholar] [CrossRef]
Fan, A.; Xu, T.; Teng, G.; Wang, X.; Zhang, Y.; Pan, C. Hyperspectral polarization-compressed imaging and reconstruction with sparse basis optimized by particle swarm optimization. Chemom. Intell. Lab. Syst. 2020, 206, 104163. [Google Scholar] [CrossRef]
Devassy, B.M.; George, S. Forensic analysis of beverage stains using hyperspectral imaging. Sci. Rep. 2021, 11, 6512. [Google Scholar] [CrossRef]

Figure 1. Overall methodology framework of the proposed work.

Figure 2. HSI configuration was used for data acquisition.

Figure 3. Representative examples of tissue substrates used in this study: (a) pink—flat, (b) pink—folded, (c) white—flat, (d) white—folded, (e) brown—flat, and (f) brown—folded.

Figure 4. Hyperspectral cube showing per-pixel 204-band spectral vector.

Figure 5. The reflectance calibration workflow.

Figure 6. AE architecture is employed for latent feature representation and reconstruction.

Figure 7. CA-based CNN for latent feature classification.

Figure 8. L-BFGS-B optimization workflow for learning positively constrained ensemble weights.

Figure 9. Temporal spectral signatures (0–5 h) of beverage stains on white tissue paper (flat configuration).

Figure 10. Line plot illustrating the empirical evaluation of latent dimensionality.

Figure 11. Bar chart illustrating the empirical evaluation of latent dimensionality.

Figure 12. Confusion matrix of the proposed accuracy-weighted AE-CA ensemble classifier.

Figure 13. Training and validation accuracy of AE-CA models with different latent dimensions.

Figure 14. Training and validation loss of AE-CA models across latent dimensions.

Table 1. Classification accuracy of AE-CA models across latent dimensions.

Latent Dimension	Test Accuracy (%)
32	96.54
64	97.19
128	97.86

Table 2. Comparison with existing beverage stain classification method.

Method	Technique	Accuracy (%)
Devassy and George (2021) [28]	CAE + SVM	94.40
Proposed Method	AE-CA Ensemble	98.28

Table 3. Class-wise performance metrics for the AE-CA ensemble classifier.

Class	Precision	Recall	F1-Score
Papaya	0.9905	0.9905	0.9905
Coffee	0.9688	0.9773	0.9732
Pomegranate	0.9959	0.9926	0.9943
Orange	0.9803	0.9878	0.9841
Tea	0.9869	0.9722	0.9795
Wine	0.9914	0.9941	0.9928
Whisky	0.9725	0.9783	0.9754
Rum	0.9605	0.9726	0.9665
Brandy	0.9841	0.9646	0.9742

Table 4. Ablation study on the effect of removing the CA module.

Model Variant	CA	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
AE32–CNN (without CA)	No	95.43	95.51	95.38	95.44
AE32–CA	Yes	96.54	96.61	96.49	96.55
AE64–CNN (without CA)	No	96.21	96.28	96.15	96.21
AE64–CA	Yes	97.19	97.23	97.14	97.18
AE128–CNN (without CA)	No	96.98	97.04	96.91	96.97
AE128–CA	Yes	97.86	97.90	97.82	97.86

Table 5. Comparison of ensemble fusion strategies: equal-weight averaging vs. L-BFGS-B optimized weighting.

Ensemble Configuration	CA	Fusion	Accuracy (%)	Precision (%)	Recall (%)
AE-CNN Ensemble (without CA)	No	Equal- weight	96.78	96.83	96.72
AE-CA Ensemble	Yes	Equal- weight	97.92	97.96	97.88
AE-CNN Ensemble (without CA)	No	L-BFGS-B	97.21	97.26	97.17
AE-CA Ensemble (Proposed)	Yes	L-BFGS-B	98.28	98.31	98.24

Table 6. Computational time analysis of AE-CA models for training and inference.

Model	AE Train (s)	Encoder Infer (s)	Classifier Train (s)	Classifier Test (s)	Test (ms/Sample)
AE32–CA	332.58	3.88	1095.74	2.31	0.099
AE64–CA	359.58	4.80	1029.85	2.17	0.092
AE128-CA	366.47	3.89	715.23	2.37	0.101

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shit, J.; Dar, M.A.; V M, M.; Roy, P.P. Beverage Stain Classification Using Hyperspectral Imaging with an L-BFGS-B-Optimized Autoencoder and a Channel-Attention 1D CNN. Informatics 2026, 13, 68. https://doi.org/10.3390/informatics13050068

AMA Style

Shit J, Dar MA, V M M, Roy PP. Beverage Stain Classification Using Hyperspectral Imaging with an L-BFGS-B-Optimized Autoencoder and a Channel-Attention 1D CNN. Informatics. 2026; 13(5):68. https://doi.org/10.3390/informatics13050068

Chicago/Turabian Style

Shit, Jitendra, Muzaffar Ahmad Dar, Manikandan V M, and Partha Pratim Roy. 2026. "Beverage Stain Classification Using Hyperspectral Imaging with an L-BFGS-B-Optimized Autoencoder and a Channel-Attention 1D CNN" Informatics 13, no. 5: 68. https://doi.org/10.3390/informatics13050068

APA Style

Shit, J., Dar, M. A., V M, M., & Roy, P. P. (2026). Beverage Stain Classification Using Hyperspectral Imaging with an L-BFGS-B-Optimized Autoencoder and a Channel-Attention 1D CNN. Informatics, 13(5), 68. https://doi.org/10.3390/informatics13050068

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Beverage Stain Classification Using Hyperspectral Imaging with an L-BFGS-B-Optimized Autoencoder and a Channel-Attention 1D CNN

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Hyperspectral Setup

3.2. Dataset Description

3.3. Reflectance Calibration

3.4. Data Preprocessing

3.5. Model Development

3.5.1. Latent Dimension Selection

3.5.2. Feature Compression via AE

3.5.3. CA Framework

3.5.4. Ensemble Learning Framework

3.6. Computational Environment

4. Results and Discussion

4.1. Temporal Spectral Analysis

4.2. Empirical Study on Latent Dimension

4.3. Performance of Individual AE-CA Models

4.4. Performance of the Ensemble Classifier

4.5. Ablation Study

4.6. Training and Validation Analysis

4.7. Training and Inference Time Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI