1. Introduction
The escalating threat of antimicrobial resistance has sharpened the clinical imperative for rapid and accurate pathogen identification. However, regular diagnostic approaches remain dominated by culture-based workflows that, although familiar and widely accessible, require a notable amount of time to both yield bacterial cultures and isolate single bacterial colonies. Most bacterial identifications to the species level are still based on enzymatic reactions, which are time-consuming to accomplish and interpret, even when using automated ID platforms. Moreover, occasionally, bacterial overlapping metabolic repertoires limit species-level discrimination, leading to misidentification or failure to classify accurately. In addition to enzymatic methods, modern techniques such as FTIR spectroscopy are increasingly employed for microbial identification. Recent studies highlight its effectiveness in clinical and environmental microbiology [
1,
2].
In terms of bacterial growth in cultures, isolation and identification, the time intervals from sample collection to a species call vary significantly, ranging from 1.81 days for blood cultures to 2 days for routine urine cultures and more than 3.5 days for respiratory infections. Of note, it is challenging to isolate bacteria such as anaerobes and may require more than 4 days of incubation and identification [
3,
4]. Moreover, phenotypic and enzymatic identification methods remain imperfect, often achieving only around 90–94% concordance at the species level, which may lead to misidentification or failure to classify 1–4% of clinical isolates, particularly in the case of fastidious or atypical organisms [
5,
6]. In all cases, an additional day is usually required to evaluate the antimicrobial resistance patterns of the isolated species. Inevitably, the above conditions prompt clinicians to meet the diagnostic timelines using broad-spectrum antimicrobials, a practice that significantly compromises patient outcomes due to potential ineffectiveness [
7,
8] and the gradual development of antimicrobial resistance levels.
The recognized need to reduce identification time triggered intensive global efforts to develop culture-independent diagnostic assays. Among others, single-cell Raman spectroscopy—often coupled with surface-engineered SERS substrates—has recently drawn considerable attention and is currently listed as one of the most promising culture-independent assays. The bacterial identification process using Raman is primarily based on creating unique biochemical fingerprints and can be deployed either directly on clinical specimens (such as blood, urine, or sputum) or on culture-derived material, with the choice balancing the required detection limit, turnaround time, and matrix complexity [
9,
10,
11,
12,
13]. The assay could roughly be divided into two phases; the first phase entails ultrafast Raman spectral acquisition—1–100 ms shots on
–
cells with a 532–785 nm laser—while the latter employs machine-learning algorithms to interpret the processed spectra and correctly classify the bacterium at the species level. To date, the method has yielded successful bacterial identification at >90% of common pathogens at the species level, while the overall assay is regularly completed in under five minutes—a 20-fold compression of the traditional timeline [
14,
15,
16].
Despite its rapid development and maturation, the widespread clinical adoption of Raman assays remains elusive due to a confluence of technical, computational, and regulatory limitations. Technically, the intrinsically weak Raman scattering of bacterial cells necessitates SERS substrates whose batch-to-batch and even intra-chip variability undermines spectral consistency [
17]. At the same time, the high instrument and consumable costs remain a limitation, restricting deployment in lower-throughput settings. From a machine-learning standpoint, the principal obstacles involve data scarcity, domain shift, model opacity, and the absence of harmonized validation standards [
18,
19]. The recently adopted state-of-the-art classifiers (random forests, convolutional neural networks, and transformer architectures) demand thousands of balanced spectra per species. However, repositories such as Bacteria-ID still suffer from an accuracy loss when models encounter spectra from different instruments or growth media, revealing acute sensitivity to class-imbalance bias and instrument-specific artifacts. Also, domain-adaptation strategies—wavelength warping, piece-wise direct standardization, and calibration-transfer schemes—have not attained the ±2% error margin acceptable in clinical chemistry; broadly, this is because Raman substrates introduce complex, nonlinear variance that resists simple parameterization [
20,
21,
22].
To overcome the limitations inherent in current machine learning (ML) approaches to bacterial identification, recent studies have introduced a range of innovative methodologies. A seminal contribution by Chi-Sing Ho et al. (2019) [
14] demonstrated the transformative potential of integrating Raman spectroscopy with deep learning to enable rapid, label-free, and culture-independent identification of pathogenic bacteria, as well as antibiotic susceptibility profiling. Their work highlighted the efficacy of convolutional neural networks (CNNs) in accurately classifying bacterial species, even from low signal-to-noise spectra, and in distinguishing between methicillin-resistant and susceptible
Staphylococcus aureus (MRSA/MSSA), thereby laying the groundwork for point-of-care, real-time diagnostic applications. Building upon this foundation, Sun et al. (2024) [
23] proposed RamanCluster, a deep clustering-based framework capable of unsupervised classification of bacterial Raman spectra without reliance on annotated training data, rendering it particularly suitable for label-free applications. In parallel, Jeon et al. (2025) [
24] demonstrated that the integration of advanced ML algorithms with optimized data preprocessing techniques, in conjunction with a hydrophobic surface-enhanced Raman scattering platform, can achieve near-perfect sensitivity in bacterial identification.
In light of recent advances in Raman-based microbial diagnostics, this study seeks to contribute further to the field by addressing both predictive performance and interpretability. The primary objective was to systematically evaluate and compare the performance of a diverse set of machine learning and deep learning models in classifying clinically relevant microbial species using Raman spectroscopy data. The analysis was structured around three biologically distinct classification tasks: (i) broad-spectrum multiclass identification of 30 bacterial and fungal taxa; (ii) group-level classification of Gram-positive versus Gram-negative bacteria; (iii) binary discrimination between Candida albicans and Candida glabrata.
In addition to performance benchmarking, the study aimed to develop and validate a reproducible analytical pipeline that integrates spectral preprocessing, deep learning-based classification, and post hoc explainability analysis. By applying SHAP (Shapley Additive exPlanations) directly on the Raman spectral domain, the framework enabled the identification of specific wavenumber regions most relevant to microbial discrimination. This combined focus on predictive accuracy and model interpretability was designed to enhance biological insight and support the clinical feasibility, transparency, and regulatory alignment of Raman-based AI tools for diagnostic microbiology.
The overarching goal is to establish a robust and interpretable Raman–AI pipeline that advances microbial diagnostics for human health applications.
3. Results
3.1. Comparative Performance of ML Models in Multiclass Pathogen Classification
The comparative evaluation of the ML models tested was based on accuracy and F1-score (
Table 4 and
Figure 3), revealing that the SVM outperformed all other models, achieving scores of nearly 0.95 for both metrics. LightGBM and Neural Networks followed closely, indicating that ensemble and deep-learning methods are highly effective for Raman-based bacterial classification. CNNs—both with and without PCA—also demonstrated strong performance, with slightly reduced scores compared to their non-CNN counterparts but still above 0.926 in both metrics.
XGBoost maintained competitive performance with PCA but performed significantly worse without it, highlighting the importance of dimensionality reduction in maintaining model generalization. K-Nearest Neighbors (k-NN) and Random Forest offered moderate accuracy but showed limitations in balancing sensitivity and precision.
Finally, Decision Trees (with PCA) and XGBoost without PCA ranked lowest in the evaluation, reaffirming that simpler models or those trained on unprocessed data may struggle in high-dimensional spectral domains. This analysis underscores the importance of both expressive model architectures and robust preprocessing techniques, such as principal component analysis (PCA), for achieving optimal performance in bacterial identification tasks based on Raman spectra.
SHAP-Based Feature Importance Across All Pathogen Bacteria
The SHAP analysis performed on the normalized Raman spectra revealed class-specific patterns of spectral importance. As visualized in the SHAP heatmap (
Figure 4), elevated SHAP values were observed primarily around 970–
, 1450–
and
.
These contributions varied across classes, with Class 17 showing a sharp SHAP peak near , while Classes 25, 28, and 19 exhibited distinct activation in the 970– range. Notably, only a limited number of wavenumber regions contributed substantially to model predictions, suggesting a compressed spectral relevance space.
3.2. Machine Learning Models’ Performance on Gram-Positive Bacteria
All models tested achieved high identification performance on Gram-positive bacteria (
Figure 5). The Support Vector Machine (SVM) with an RBF kernel produced the highest mean accuracy at 96.27%, with minimal performance variability across folds, suggesting strong generalization across different sample subsets. Neural networks and LightGBM followed closely, with accuracies of 95.41% and 95.31%, respectively, also demonstrating reliable behavior. Although the CNN model showed slightly more variation in performance, its accuracy remained within a high-performance range, highlighting its effectiveness in capturing nonlinear spectral patterns.
The consistently high scores across metrics—accuracy, precision, recall, and F1-score—indicate that these models reliably distinguished Gram-positive species based solely on their spectral profiles. This supports their potential role in clinical workflows, where fast, label-free species identification can improve diagnostic timelines and inform antimicrobial stewardship.
SHAP-Based Feature Interpretation for Gram-Positive Classification
To investigate the decision-making patterns of the CNN model applied to Gram- positive bacterial spectra, SHAP values were computed directly across all Raman wavenumbers. The resulting heatmap (
Table 5 and
Figure 6) displays the average feature importance per class and wavenumber.
Elevated SHAP values were observed in a narrow spectral region, most prominently around . This region appeared recurrently across multiple classes, suggesting that certain wavenumber bands contributed disproportionately to class predictions. These findings likely reflect genuine biological differences between the taxa. Variations in the biochemical composition of bacterial cell walls, such as the degree of peptidoglycan cross-linking, teichoic acid concentration, or the presence of surface proteins and glycolipids, can all influence Raman scattering intensity and band structure.
For example, Streptococcus spp. is known to express group-specific carbohydrate antigens, whereas Enterococcus species exhibit diversity in membrane-associated lipoproteins and glycolipids [
26,
27]. The SHAP heatmaps revealed that specific Raman bands, such as those near
, were differentially weighted across these taxa, suggesting that these biochemical traits are captured by the model [
25,
26]. These chemically informative features, although often undetectable by conventional staining or biochemical profiling techniques, are amplified and interpreted through Raman spectroscopy in combination with explainable AI. This supports the model’s capacity to detect both broad-spectrum spectral information and species-specific molecular traits, such as peptidoglycan composition or membrane-bound biomolecules [
13,
28], thereby enhancing diagnostic resolution and biological interpretability.
3.3. Machine Learning Models’ Performance on Gram-Negative Classification
The classification of Gram-negative (
Figure 7) bacteria based on Raman spectral data yielded consistently strong results across all models tested. Accurate identification of these species is clinically important, given their association with multidrug resistance and involvement in severe infections such as urinary tract infections, pneumonia, and septicemia.
Among the models evaluated, the Support Vector Machine (SVM) with an RBF kernel achieved the highest overall performance. It recorded a mean accuracy of 98.59%, along with precision, recall, and F1-score values, all at 98.59%, indicating highly consistent and balanced predictions across the Gram-negative taxa. Neural Networks and LightGBM followed closely with accuracies of 98.41% and 98.34%, respectively. The CNN model also delivered robust results, attaining 98.18% accuracy. These patterns mirror those observed in the Gram-positive classification task, reinforcing the SVM’s reliability in handling complex microbial datasets.
SHAP-Based Feature Interpretation for Gram-Negative Classification
As regards the interpretation of how the CNN model distinguished between Gram-negative bacterial species, SHAP values were computed directly across the Raman spectral domain (
Figure 8 and
Table 6). The analysis revealed distinct class-specific spectral dependencies, with Classes 2, 5, and 11 showing particularly strong SHAP activation in the low-wavenumber region around
, and all Classes displaying elevated relevance near
and
. These patterns suggest that different Gram-negative taxa exhibit characteristic Raman features that are differentially captured by the model.
The SHAP heatmap also highlighted common spectral regions, such as the bands near 1120–, which were consistently weighted across multiple classes. These regions likely correspond to conserved biochemical structures among Gram-negative species.
These spectral features likely reflect underlying biological differences in Gram-negative cell envelope architecture, including structural variation in lipopolysaccharide layers, differences in outer membrane protein content, and species-specific metabolite profiles. Such characteristics are often inaccessible using conventional microbiological tests, but become evident through vibrational spectroscopy enhanced by explainable AI, enabling interpretable, data-driven insights into microbial classification.
3.4. Machine Learning Models’ Performance on Candida spp. Classification
The ability to discriminate between
Candida albicans and
Candida glabrata was evaluated through a binary classification task using Raman spectral data. This analysis, performed using three different computational models, aimed to assess how well each approach could distinguish between the two species based on subtle biochemical differences captured in their spectra (
Table 7 and
Figure 9).
The convolutional neural network (CNN) trained directly on the whole, unprocessed Raman spectra showed the best overall performance. It achieved an accuracy of 92.8%, with similarly high values for precision, recall, and F1-score. These results suggest that the CNN was able to identify and use meaningful biochemical signals from the spectra to classify the species correctly. The model’s consistent performance across all metrics indicates both reliability and balance in detecting true positives while avoiding false classifications—an important characteristic in any diagnostic context. When dimensionality reduction was applied using Principal Component Analysis (PCA) before CNN training, a slight decrease in performance was observed. The model still performed well, with all metrics below 90%, but the drop suggests that compressing the spectral data may remove some of the finer biochemical information necessary for accurate classification. This highlights the trade-off between simplifying data and retaining diagnostically relevant detail.
The XGBoost model, a tree-based machine learning approach, also used PCA-transformed spectra but performed slightly less effectively than both CNN versions. Its accuracy and related metrics remained high (around 88.5%), but its slightly lower recall indicates a greater tendency to miss true-positive cases. This may reflect the model’s limited ability to recognize more complex biochemical patterns in the data.
SHAP-Based Spectral Feature Interpretation for Candida spp. Classification
As regards the interpretation of how the CNN model distinguished between Candida bacterial species, SHAP values were computed directly across the Raman spectral domain. The analysis revealed distinct spectral dependencies, with elevated SHAP activation in the regions around 760–770−1, 980–990−1, , , , , and . These patterns suggest that all the Candida species exhibit characteristic Raman features that are differentially captured by the model.
Figure 10 shows the SHAP values for each class across all wavenumbers. Striped vertical bands indicate spectral regions where the CNN places the greatest importance.
4. Discussion
Infectious diseases caused by the bacterial and fungal pathogens examined here remain a major global health burden, where delays in accurate identification directly compromise patient outcomes. Traditional diagnostics, though reliable, are hindered by slow culture-based processes that cannot keep pace with the urgent need for rapid, targeted therapy. Raman spectroscopy has emerged as a fast, label-free biochemical fingerprinting method that captures pathogen-specific molecular signatures within minutes. However, the inherent complexity of spectral data has historically limited its clinical translation. By coupling Raman spectroscopy with artificial intelligence, this study demonstrates how machine learning not only resolves these analytical challenges but also delivers interpretable predictions across diverse taxa, highlighting the promise of Raman–AI pipelines as next-generation diagnostic tools.
Within this framework, our findings revealed distinct strengths across microbial groups, with particularly high performance in Gram-negative bacteria. This likely reflects the chemically diverse structures of their outer membranes, which produce discriminative Raman signatures well captured by AI models. Building on this observation, the outer-membrane architecture—rich in lipopolysaccharides (LPS), lipid A, porins, and outer-membrane proteins—creates heterogeneous, species-specific vibrational patterns that facilitate robust interspecies separation in Raman space [
29,
30]. Variability in LPS O-antigen chains and lipid A phosphorylation, in particular, is consistent with the distinct signatures our models exploited for accurate classification [
29,
30].
By comparison, Gram-positive organisms lack an outer membrane but possess thick peptidoglycan layers enriched with teichoic and lipoteichoic acids. Despite this more conserved envelope, our results indicate that species-level differences in wall-associated polysaccharides, surface proteins, and membrane enzymes provide sufficient biochemical diversity for reliable discrimination [
31,
32]. Prior work focused on clinically relevant Staphylococcus spp. further supports this potential: Tang et al. reported high performance using SERS-based classification across > 100 strains, underscoring the discriminatory power of vibrational fingerprints even within closely related Gram-positive taxa [
33].
For the fungal genus Candida, performance declined when distinguishing
Candida albicans from
Candida glabrata—a biologically unsurprising result given that Candida are fungi (yeasts) with shared structural and metabolic features. Both species exhibit similar wall scaffolds (
1,3/
-1,6-glucans, chitin, and mannan-rich glycoproteins) that can yield overlapping spectra [
34]. Moreover, although
C. glabrata is phylogenetically closer to Saccharomyces cerevisiae than to
C. albicans, convergent adaptation to the human host has produced comparable pathogenic traits, further narrowing spectral separability [
35]. Even so, near-term discrimination at clinically useful accuracy remains valuable given the time-critical nature of antifungal therapy in invasive candidiasis.
Across models, Support Vector Machines (SVMs) consistently led performance, aligning with prior reports that highlight their robustness on high-dimensional spectroscopic data [
14,
36]. Deep learning—particularly convolutional neural networks—also performed strongly and is attractive for capturing nonlinear spectral structure, including resistance-related patterns in focused tasks [
15]. However, our results suggest CNNs are more sensitive to data scale and balance, reinforcing the practical advantage of kernel methods when datasets are limited or taxonomies are broad [
14,
15,
36].
A key contribution of this work is the use of explainable AI to connect predictions to biochemistry. SHAP analyses on raw wavenumbers revealed conserved and class-specific regions—most prominently near 1120–, , and —that align with C–C/C–N stretching, bending, and protein-associated (Amide II) vibrations, respectively. This direct, spectrum-level attribution supports biological plausibility, clarifies which bands drive discrimination across taxa, and strengthens the translational credibility of Raman–AI outputs in clinical contexts.
Clinically, the compression of identification timelines from days to minutes has clear implications: earlier organism-level calls can enable timely, targeted therapy, improving outcomes while supporting antimicrobial stewardship and reducing costs. These advances are particularly salient for bloodstream and respiratory infections, where rapid pathogen resolution can alter management within the initial window of care.
Important challenges remain for translating Raman–AI diagnostics into routine clinical practice. Progress toward standardized, open-access spectral libraries for clinically relevant pathogens will be essential to improve reproducibility and enable cross-institutional validation. Algorithmically, adaptive methods that prioritize informative spectral regions and handle polymicrobial signals, class imbalance, and domain variability are needed to ensure robustness under real-world conditions. Several study-specific limitations should also be acknowledged. The dataset was derived from cultured isolates rather than direct patient specimens, which ensured high-quality spectra but did not capture the complexity of clinical matrices such as blood, urine, or sputum. The evaluation relied on cross-validation within a single dataset, without external validation to confirm generalizability across instruments and clinical settings. The study focused on species-level identification, while antimicrobial resistance profiling was not directly addressed. Addressing these constraints provides a clear agenda for future work, including validation directly on clinical specimens, the creation of large spectral repositories, integration of antimicrobial resistance prediction, and adoption of advanced model architectures. In parallel, combining Raman spectroscopy with complementary modalities such as MALDI-TOF MS or metagenomics, and developing lightweight models deployable on portable Raman devices, will further support translation from experimental validation to routine hospital and point-of-care workflows.