Next Article in Journal
In Vitro and In Situ Evaluation of White Mulberry (Morus alba) Pomace and Leaf: Fermentation Kinetics, Digestibility, and Potential as Alternative Ruminant Feed Sources
Previous Article in Journal
Effect of Bulking Agent Particle Size on Garden Waste–Dairy Manure Composting: Relationship Between Microbial Community Dynamics and Physicochemical Factors
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Intelligent Strategy for Colony De-Replication Using Raman Spectroscopy and Hybrid Clustering

1
School of Information Engineering, Suqian University, Suqian 223800, China
2
College of Instrumentation and Electrical Engineering, Jilin University, Changchun 130061, China
3
Allview Laboratory, Changchun 130118, China
*
Authors to whom correspondence should be addressed.
Fermentation 2025, 11(12), 691; https://doi.org/10.3390/fermentation11120691
Submission received: 8 October 2025 / Revised: 9 December 2025 / Accepted: 11 December 2025 / Published: 12 December 2025
(This article belongs to the Section Fermentation Process Design)

Abstract

Efficient de-redundant colony picking is essential to accelerating strain screening in fermentation microbiology. Conventional random picking is inefficient, exhibits high redundancy, and often misses low-abundance but valuable strains. To address this, we present a high-efficiency de-redundant selection strategy based on colony Raman spectroscopy and a hybrid clustering algorithm. We directly acquire colony Raman spectra and combine the complementary strengths of k-means and hierarchical clustering (HCA) to achieve both balanced global partitioning and sensitivity to low-abundance taxa. Systematic application on pure colonies and complex plate settings shows that, by picking only 12–26% of colonies, the method attains 80–100% species coverage. Relative to manual random picking and image-based feature selection, picking efficiency increased by 116.8% and 44.5%, respectively, substantially shortening the screening cycle and reducing workload. Overall, Raman-guided hybrid clustering substantially reduces redundant picking and improves detection of low-abundance strains. It provides practical support for efficient strain discovery, library construction, and process optimization.

1. Introduction

Efficient discovery and utilization of fermentative microbial resources underpin the optimization of industrial fermentation and the discovery of novel strains [1]. Colony picking represents a pivotal step in this pipeline, as its efficiency and accuracy directly shape the quality, diversity, and cost of downstream analyses [2,3]. Traditionally, this process relies on manual random selection, which is not only time-consuming and labor-intensive but also frequently results in redundant isolates [4,5]. In complex environmental or fermentation samples, dominant microbial species often mask rare or low-abundance taxa that may harbor unique metabolic capabilities or industrial potential [6]. Consequently, such blind and inefficient selection strategies prolong the transition from initial colony screening to functional validation, thereby hindering the rapid construction of microbial resource libraries and iterative process optimization [7,8].
To overcome this throughput bottleneck, automated colony-picking systems based on visual recognition have been introduced [9]. These systems typically select colonies by extracting morphological features such as size, color, and shape. However, microbial colonies often display significant morphological plasticity—distinct phenotypes may correspond to genetically similar lineages, and conversely, morphologically indistinguishable colonies can exhibit divergent genotypes and metabolic traits [10,11]. Thus, phenotypic imaging alone provides insufficient taxonomic resolution, resulting in redundant isolation or missed detection of valuable but visually indistinct strains [12]. Therefore, a sensing technology capable of probing intrinsic biochemical signatures rather than surface morphology is critically needed for precise and intelligent colony selection [13].
Raman spectroscopy meets this requirement as a label-free, non-destructive technique that captures the biochemical fingerprints of living cells, encoding molecular information about nucleic acids, proteins, lipids, and carbohydrates [14,15]. Unlike superficial imaging, Raman spectra reflect metabolic states and macromolecular compositions, thereby allowing discrimination among colonies and subpopulations at the single-cell level. When combined with unsupervised machine learning algorithms, spectral patterns can be mined for objective classification and de-replication without prior labeling [16,17]. However, the application of clustering algorithms to Raman-guided microbial screening faces a specific challenge: sample imbalance. Standard algorithms like k-means provide stable global partitioning but tend to merge small, low-abundance clusters into larger ones, whereas hierarchical clustering is more sensitive to local structures but susceptible to noise and computational burden [18,19].
Motivated by these limitations, the present study aims to develop and validate an intelligent de-replication framework that couples colony Raman spectroscopy with a hybrid clustering strategy. Our goal is to enhance species coverage, reduce selection redundancy, and improve the detection of low-abundance strains in both controlled pure-colony and complex plate systems. Specifically, this study (i) develops an intelligent colony de-replication framework integrating Raman spectroscopy and hybrid clustering, (ii) investigates how the integrated approach balances redundancy reduction with sensitivity to low-abundance taxa, and (iii) demonstrates its applicability and performance in complex mixed-plate experiments representative of industrial conditions. An overview of the proposed workflow—covering spectral acquisition, preprocessing, clustering integration, and representative colony selection—is illustrated in Figure 1. This framework establishes a practical and generalizable paradigm for intelligent, de-redundant colony selection, providing methodological support for efficient strain discovery, library construction, and industrial fermentation optimization.

2. Materials and Methods

2.1. Strains and Experimental Systems

We studied 10 representative strains commonly observed in industrial fermentation, spanning Lactococcus and Enterococcus: Lactococcus lactis, Lactococcus garvieae, Enterococcus faecalis, E. faecium, E. durans, E. mundtii, E. hirae, E. casseliflavus, E. gallinarum, and Weissella cibaria (strain IDs in Figure 2, labeled L: 1–10). Several strains are closely related phylogenetically and spectrally similar, providing a rigorous testbed for assessing within-clade aggregation and between-clade separation [20,21].
Two systems were designed: pure-colony and mixed plates. Pure-colony cultures were used to simulate colonies grown under controlled yet varied environmental conditions, enabling reproducible evaluation of spectral stability and clustering behavior across distinct strain states [22]. By controlling inoculum density and culture time, colonies of comparable size and morphology were obtained, forming a balanced dataset for algorithm validation [23]. In contrast, mixed-plate experiments incorporated all 10 strains at different abundance ratios to approximate the heterogeneity and skewed distribution found in real fermentation or environmental samples [24,25]. The mixed plates, prepared independently from the pure-colony cultures, therefore served as cross-condition validation sets, providing effective external verification of the method’s robustness beyond the controlled testbeds. This integrated design supports multidimensional performance assessment, including species coverage, redundancy reduction capability, and low-abundance detection.

2.2. Raman Acquisition Strategy and Instrument Parameters

To ensure high-quality, representative cellular Raman spectra from colonies, we optimized spectrometer settings and acquisition strategy.
Raman spectra were acquired using a confocal Raman spectrometer (Model P300, Hooke Instruments Ltd., Changchun, China) [16,26]. The system is equipped with a 532 nm solid-state excitation laser and a high-stability optical path based on confocal alignment. A spectrograph with a 1200 lines/mm holographic grating provided a spectral resolution of approximately 2 cm−1 per pixel. The detector consists of a thermoelectrically cooled CCD camera (1024 × 256 pixels), ensuring low dark noise during extended integration. A 50× long-working-distance objective (NA = 0.75) was used to focus the laser onto the colony surface with a spot size of approximately 1 µm. Raman signals were collected in the range of 400–1800 cm−1, covering major biochemical fingerprint regions of microbial cells [27]. Colonies were sampled directly on aluminum-coated chips (mirror-finished substrates) to minimize background fluorescence and scattering interference, thereby enhancing spectral signal-to-noise ratio (SNR) and ensuring consistent optical geometry across all measurements.
Laser power was set to 5 mW at the sample surface, balancing signal strength and sample integrity, and the integration time for each spectrum was 5 s. For each colony, 20–40 single-cell spectra were collected at evenly spaced positions around half the colony radius and at approximately half the depth from surface to plate bottom, adapting to colony size. This strategy provides representative and reproducible spectra suitable for downstream clustering and selection. All acquisition parameters, including excitation wavelength, grating, detector type, optical substrate, and numerical aperture of the objective, were optimized to ensure inter-sample comparability and experimental reproducibility.

2.3. Spectral Denoising and Quality Assessment

We implemented baseline correction and noise suppression followed by quantitative quality evaluation. Baselines were corrected using adaptive iteratively reweighted penalized least squares (airPLS) [28]. This method iteratively reweights and penalizes the baseline model to remove fluorescence-induced drifts while preserving true Raman peaks. We then applied Savitzky–Golay smoothing to reduce high-frequency noise without distorting peak shape and intensity [29]. Finally, spectra were min-max normalized.
Specifically, baseline correction was performed using air-PLS (lambda = 15, max iterations = 10), followed by Savitzky–Golay smoothing (window = 7, polynomial order = 3) and min–max normalization. A total of 400 spectra (40 per colony across 10 colonies) were processed to ensure representative quality evaluation. Quality was quantified using a characteristic spectral signal-to-noise ratio (CS_SNR):
C S _ S N R = max ( S x 1 S x 2 ) min ( S x 1 S x 2 ) max ( S x 3 S x 4 ) min ( S x 3 S x 4 )
where  x 1 - x 2  denotes a characteristic band and  x 3 - x 4  a silent region (1750–1800 cm−1).

2.4. Colony Selection Framework and Evaluation

We devised a hybrid unsupervised clustering framework combining k-means and single-linkage HCA to achieve balanced global partitioning and low-abundance sensitivity.
For labeled pure-colony data, we constructed balanced datasets using colonies of similar area to reduce abundance bias. After standardization, k-means partitioned spectra by minimizing within-cluster sum of squared Euclidean distances, favoring compact clusters and representative global partitioning. For k-means, Euclidean distance was used with cluster numbers k = 2–10, a fixed random seed (42), and 300 maximum iterations; details are provided in the open repository [30].
For imbalanced datasets with skewed abundance, we applied single-linkage HCA. By progressively linking nearest pairs, HCA sensitively captures subtle yet stable heterogeneity, aiding recognition of low-abundance colonies. The HCA used Euclidean distance with linkage methods (Ward, centroid, weighted, and UPGMA) evaluated comparatively, and an empirical linkage threshold of 10 was applied to balance cluster compactness and separation [30].
Clustering quality and the “optimal number of picks” were jointly guided by the Silhouette Coefficient (SC) and the Calinski–Harabasz index (CH). We defined a Weighted Clustering Index (WCI) to integrate local boundary clarity and global separability:
W C I = α S ¯ + ( 1 α ) C H C H max
where  S ¯  is mean silhouette coefficient,  C H max  is the maximum CH over candidate cluster numbers, and  α  is a tunable weight. The number of clusters maximizing WCI determined the recommended number of colonies to pick.
For representative selection within each cluster, we performed label statistics and chose the colony label with the highest proportion and stable distribution as the representative. A label was confirmed only if its spectra within the cluster exceeded half of its total acquired spectra. Otherwise, the next-ranked stable label was selected. Other colonies were considered redundant.
The proposed colony de-replication framework seamlessly integrates data preprocessing, hybrid clustering, and representative colony selection into a unified analytical pipeline. In this framework, the hybrid evaluation scheme based on the Weighted Clustering Index (WCI) provides a multidimensional application of clustering quality—quantitatively linking accuracy, precision, sensitivity, specificity, and robustness within an unsupervised context. This design establishes an internally consistent and methodologically reliable basis for evaluating colony-level de-replication performance, highlighting a key innovation of this study. The complete source code and exemplar dataset are openly available in our repository (see Ref. [30]) to ensure full reproducibility of the proposed workflow.

2.5. Validation on Complex Mixed Plates

We applied the framework to complex mixed plates containing all 10 strains to emulate industrial or environmental complexity. After hybrid clustering on preprocessed spectra, we generated a recommended picking list and performed 16S rRNA sequencing on the recommended colonies. We compared results with the known inoculation composition to assess species coverage, redundancy reduction, and low-abundance detection. For benchmarking, we compared against manual random picking and an image feature–based strategy.

3. Results and Discussion

3.1. Spectral Denoising Performance

Following the procedures described in Section 2.3, we evaluated the improvement in spectral quality of colony Raman data. Baseline correction effectively aligned noise levels with the silent region and improved the accuracy of SNR estimation.
Average CS_SNR increased from 5.9 ± 0.4 to 8.2 ± 0.5 (gain: 2.3), and the mean signal-to-noise ratio of the major characteristic bands (560–1655 cm−1) was quantitatively improved by 9.1 on average, demonstrating a substantial reduction in background noise and enhanced peak resolvability. Figure 3 illustrates a typical example for Lactococcus lactis, highlighting noise suppression, stability improvement (EB), and the clearer resolution of characteristic biochemical peaks. After preprocessing, distinct Raman bands become evident at approximately 560 cm−1 (C–O–C stretching of lipopolysaccharides), 660 cm−1 (aromatic ring vibration of tyrosine and phenylalanine), 740 cm−1 (O–P–O stretching of nucleic acid bases—pyrimidine and uracil), 870 cm−1 (tyrosine vibration), 922 cm−1 (Protein band C–C), 1004 cm−1 (Phenylalanine symmetric ring breathing), 1240 cm−1 (nucleic acid bands of cytosine and thymine), 1320 cm−1 (protein amide III, C–N and CO–NH vibration), 1450 cm−1 (CH2 bending of proteins and lipids), and 1655 cm−1 (amide I CO–NH stretching of phenylalanine and other protein backbones). These bands represent the dominant biochemical signatures of cellular macromolecules in L. lactis, confirming that the preprocessing methods effectively restore true biological signals crucial for accurate clustering and strain discrimination [31,32,33].

3.2. Application of the Method on Pure Colonies

We first built a balanced dataset with similar colony areas to reduce abundance bias. As shown in Figure 4a, the t-SNE visualization of the preprocessed spectra exhibited distinct and well-separated clusters corresponding to the seven representative strains, indicating that the Raman features encode clear inter-strain variability and robust discriminatory power. For subsequent clustering analysis, the optimal k = 7 was selected based on the maximum Weighted Clustering Index (WCI) criterion to ensure balanced partitioning. To further examine the biochemical basis underlying this separation, the Raman spectra of the seven strains (Lactococcus lactis, L. garvieae, Enterococcus faecalis, E. faecium, E. durans, E. casseliflavus, and E. hirae) were stack-plotted as mean ± standard deviation (Figure 4b). The stacked spectra demonstrate that, although these lactic-acid bacteria share similar overall fingerprint patterns, visible variations in intensity and band position occur among key biochemical regions (≈560–1655 cm−1), including protein amide bands, CH2 stretching, and nucleic acid vibrations. The average within-strain variance was low (mean SD ≈ 0.06), and the largest spectral deviations were mainly localized to amide and CH-bond regions, indicating excellent spectral stability within each strain and reliable experimental reproducibility. These spectral differences provide direct evidence of the method’s strain-level specificity, supporting its use in refined species discrimination and colony de-replication.
As shown in Figure 4c,d, on the balanced dataset k-means achieved cluster distributions highly concordant with ground-truth labels and exhibited robust global separation. By contrast, HCA showed slightly more inter-cluster overlap and fuzzier boundaries, suggesting lower stability for balanced data relative to k-means. Together, these results demonstrate that Raman spectral features enable precise taxonomic discrimination among closely related strains while maintaining high intra-strain consistency, underpinning the specificity and reliability of the proposed method.
As shown in Figure 5a, when the dataset was intentionally unbalanced to reflect uneven colony abundance, the t-SNE projection still presented distinct cluster boundaries for most strains, although rare taxa tended to form smaller and less compact groups. The stacked Raman spectra of the seven species (Figure 5b) remained highly consistent with those of the balanced dataset (Figure 4b), exhibiting only slightly broader standard-deviation envelopes for low-abundance strains such as E. faecium and E. durans. The average within-strain variability was approximately 0.07 (range 0.03–0.12), with the most pronounced fluctuations around 1450 cm−1 attributable to CH2 deformation bands of proteins and lipids—representing expected spectral variations arising from sampling imbalance rather than measurement instability. As illustrated in Figure 5c,d, HCA more sensitively captured fine-scale differences among rare-strain clusters and achieved better species separation, while k-means exhibited a tendency toward cluster mixing when sample counts differed substantially, sometimes absorbing low-abundance profiles into dominant groups. Thus, under imbalance, HCA better preserves local heterogeneity and enhances rare-taxon recognition, providing complementary strength to k-means for comprehensive cluster coverage.
In summary, analysis between balanced (Figure 4) and imbalanced (Figure 5) datasets reveals consistent Raman spectral patterns and high methodological robustness. While k-means proves advantageous for global partitioning of balanced data, HCA exhibits superior sensitivity toward rare colonies and local spectral diversity under imbalance. The complementarity between these two clustering strategies supports the proposed hybrid approach, allowing simultaneous optimization of global balance and local rarity recognition—a critical foundation for accurate and efficient de-redundant picking in microbial libraries.

3.3. Application of the Method on Complex Mixed Plates

We collected Raman spectra from 113 colonies across two mixed plates (A: 46; B: 67), covering all 10 strains. The clustering results shown in Figure 6 are based on the combined dataset comprising colonies from plates A and B (113 in total), rather than a single plate. This design demonstrates the capability of the proposed Raman-guided hybrid clustering framework to perform cross-plate colony de-replication, allowing simultaneous analysis and representative selection over multiple cultivation surfaces. By jointly processing spectra from different plates, redundant isolation of identical strains appearing on separate plates can be avoided, thereby further improving the overall colony-picking efficiency under industrial screening conditions. We performed HCA and k-means independently (Figure 6a,b) and merged the complementary results to generate a unified picking list. Each color-coded cluster in Figure 6 corresponds to a spectrally distinct group that was validated by 16S rRNA sequencing to represent a unique strain. Therefore, the clusters displayed in the figure indeed reflect biologically different taxa rather than algorithmic artifacts, confirming the biological relevance and accuracy of the hybrid clustering results. The spatial distribution of recommended colonies across plates A and B is indicated in Figure 6c.
16S rRNA sequencing of recommended colonies showed that picking only 12.4% (14/113) achieved 80% species coverage (8/10). This demonstrates that, in complex communities, the Raman-hybrid strategy enables substantial redundancy reduction while maintaining high detection coverage.
We benchmarked against manual random picking and an image feature–based approach on plate A (46 colonies), plate B (67), and a combined plate C (113). WCI recommended 12, 13, and 14 picks, respectively. Using 16S rRNA sequencing, we obtained species coverage as follows, as summarized in Table 1. It should be noted that the reported improvements represent average performance across the three plate settings rather than the maximum gain on a single plate. Although plate B displayed only one additional strain compared with the image feature–based method, its baseline coverage was already high (7 of 10 species identified) and thus offered limited headroom for further gains. In contrast, plate A and the combined dataset (plate C) showed substantial coverage increases, highlighting that the Raman-hybrid strategy delivers its largest benefits in complex or highly imbalanced settings where low-abundance colonies are frequent. Unlike image feature–based clustering, which mainly captures morphological differences, Raman spectroscopy interrogates intrinsic biochemical signatures and can resolve strain-specific spectral features in rare or visually similar colonies.
Across all settings, Raman + clustering outperformed both baseline approaches. On plate A, it achieved full species coverage (10/10), while manual random and image-based methods reached only 40% and 70%, respectively. On plates B and C, the Raman + clustering strategy achieved 80% coverage, clearly surpassing random (30–50%) and image-based (40–70%) picking. Although the numerical improvement on plate B appears modest due to its relatively high baseline coverage (7 out of 10 species already identified by the image approach), the Raman-guided method still detected low-abundance taxa that were missed by all other strategies. Therefore, even when per-plate numeric gains seem limited (such as in plate B), the Raman-guided approach consistently identifies rare or visually similar colonies that image-based selection tends to overlook. When aggregated across multiple plates, these additional discoveries translate into a pronounced overall improvement in species coverage and a robust industrial advantage for mixed-culture de-replication. By picking only 12–26% of colonies, our strategy reached 80–100% species coverage, representing statistically significant improvements in picking efficiency (116.8% over manual random picking and 44.5% over image-based selection; p < 0.05, two-tailed t-test across triplicate plates).
Overall, these results firmly position the present Raman-hybrid clustering framework within the continuum of recent methodological advances in intelligent microbiological screening [16,18,27]. By linking intrinsic spectral biochemical features with data-driven clustering, this study not only confirms previous conclusions regarding Raman’s discriminatory power but also extends them by integrating multi-plate and low-abundance recognition capabilities critical to industrial applications. This conceptual fusion between spectral analytics and cluster integration provides both practical efficiency and theoretical continuity with prior literature, thereby strengthening the discussion and contextualizing our contribution within the current state-of-the-art.

4. Conclusions

We developed and validated a Raman spectroscopy-guided hybrid clustering strategy for de-redundant colony selection in fermentation microbiology. On pure-colony datasets, we demonstrated the complementary strengths of k-means (global partitioning) and hierarchical clustering (low-abundance sensitivity). On complex mixed plates, the hybrid approach coupled with within-cluster representative selection achieved 80–100% species coverage by picking only 12–26% of colonies, substantially reducing redundancy and enhancing detection of low-abundance strains. Compared with manual random and image-based strategies, our method shortens screening cycles and reduces workload.
Rather than merely improving numeric performance, this Raman-hybrid clustering framework provides a generalizable paradigm for intelligent microbial screening and library construction. By leveraging intrinsic biochemical spectra rather than surface morphology, it enables more reliable identification of metabolically distinct but visually similar isolates, supporting broader applications in industrial strain discovery, environmental microbiome mining, and automated high-throughput fermentation processes. Furthermore, the modular architecture of the hybrid clustering workflow can be readily adapted to other label-free analytical modalities—such as hyperspectral imaging or mass spectrometric profiling—facilitating its integration into next-generation robotic colony-picking systems. While the proposed Raman-hybrid clustering framework achieves efficient and consistent performance, it remains constrained by a relatively small sample size, limited external validation conditions, and potential variability in Raman signal acquisition arising from instrument calibration and colony heterogeneity. Future work will expand colony diversity, perform cross-system validation, and refine spectral calibration to further enhance the generalizability and robustness of the method.
In summary, this study establishes a versatile and scalable strategy for efficient mixed-culture de-replication. Its methodological design and demonstrated robustness highlight promising avenues for accelerating microbial resource exploration and industrial bioprocess optimization.

Author Contributions

X.L. conceived and designed the study, developed the core algorithm, and conducted the main experiments. M.L. contributed to the overall study design and manuscript structure. J.S. performed the biological validation of the colony selection results. S.W. was responsible for data curation and integration. X.L. and M.L. jointly supervised the project and contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Suqian Sci & Tech Program (Grant No. K202540) and the Suqian University Interdisciplinary Integration Innovation Application Excellent Project (No. School 2024XQT009).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The hybrid clustering code is available at https://github.com/lixinli1230-commits/kmeans-HCA (accessed on 10 October 2025). The dataset is shared via Baidu Netdisk as data.zip at https://pan.baidu.com/s/1HFPMdP0ES10e9nm5N8YHCg? (accessed on 10 October 2025) (access code: 5j4a).

Acknowledgments

The authors thank Hooke Instruments Ltd. for providing technical support in Raman signal acquisition and data resources used in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhang, Y.; Zhu, X.; Wang, N.; Liu, X.; Wang, L.; Ning, K. Synergy of traditional practices and modern technology: Advancing the understanding and applications of microbial resources and processes in fermented foods. Trends Food Sci. Technol. 2025, 157, 104891. [Google Scholar] [CrossRef]
  2. Huang, Y.; Sheth, R.U.; Zhao, S.; Cohen, L.A.; Dabaghi, K.; Moody, T.; Sun, Y.; Ricaurte, D.; Richardson, M.; Velez-Cortes, F.; et al. High-throughput microbial culturomics using automation and machine learning. Nat. Biotechnol. 2023, 41, 1424–1433. [Google Scholar] [CrossRef] [PubMed]
  3. Jiang, Y.; Luo, J.; Huang, D.; Liu, Y.; Li, D.-D. Machine learning advances in microbiology: A review of methods and applications. Front. Microbiol. 2022, 13, 925454. [Google Scholar] [CrossRef] [PubMed]
  4. Heuser, E.; Becker, K.; Idelevich, E.A. Evaluation of an automated system for the counting of microbial colonies. Microbiol. Spectr. 2023, 11, e00673-23. [Google Scholar] [CrossRef] [PubMed]
  5. Cai, X.; He, Y.; Yu, I.; Imani, A.; Scholl, D.; Miller, J.F.; Zhou, Z.H. Atomic structures of a bacteriocin targeting Gram-positive bacteria. Nat. Commun. 2024, 15, 7057. [Google Scholar] [CrossRef]
  6. Dai, L.L.; Pan, Q.K.; Miao, Z.H.; Suganthan, P.N.; Gao, K.Z. Multi-objective multi-picking-robot task allocation: Mathematical model and discrete artificial bee colony algorithm. IEEE Trans. Intell. Transp. Syst. 2023, 25, 6061–6073. [Google Scholar] [CrossRef]
  7. Hasan, S.; Marsafari, M.; Tolosa, M.; Andar, A.; Ramamurthy, S.S.; Ge, X.; Kostov, Y.; Rao, G. Rapid ultrasensitive and high-throughput bioburden detection: Microfluidics and instrumentation. Anal. Chem. 2022, 94, 8683–8692. [Google Scholar] [CrossRef]
  8. Ombelet, S.; Natale, A.; Ronat, J.-B.; Kesteman, T.; Vandenberg, O.; Jacobs, J.; Hardy, L. Biphasic versus monophasic manual blood culture bottles for low-resource settings: An in-vitro study. Lancet Microbe 2022, 3, e124–e132. [Google Scholar] [CrossRef]
  9. Coronnello, C.; Francipane, M.G. Moving towards induced pluripotent stem cell-based therapies with artificial intelligence and machine learning. Stem Cell Rev. Rep. 2022, 18, 559–569. [Google Scholar] [CrossRef]
  10. Ceriotti, G.; Borisov, S.M.; Berg, J.S.; de Anna, P. Morphology and size of bacterial colonies control anoxic microenvironment formation in porous media. Environ. Sci. Technol. 2022, 56, 17471–17480. [Google Scholar] [CrossRef]
  11. Binelli, M.R.; Kan, A.; Rozas, L.E.; Pisaturo, G.; Prakash, N.; Studart, A.R. Complex Living Materials Made by Light-Based Printing of Genetically Programmed Bacteria. Adv. Mater. 2023, 35, 2207483. [Google Scholar] [CrossRef]
  12. East, A.; Campolongo, E.G.; Meyers, L.; Rayeed, S.M.; Stevens, S.; Zarubiieva, I.; Fluck, I.E.; Girón, J.C.; Jousse, M.; Lowe, S.; et al. Optimizing image capture for computer vision-powered taxonomic identification and trait recognition of biodiversity specimens. Methods Ecol. Evol. 2025, 16, 2260–2275. [Google Scholar] [CrossRef]
  13. Rattray, J.B.; Lowhorn, R.J.; Walden, R.; Márquez-Zacarías, P.; Molotkova, E.; Perron, G.; Solis-Lemus, C.; Alarcon, D.P.; Brown, S.P. Machine learning identification of Pseudomonas aeruginosa strains from colony image data. PLoS Comput. Biol. 2023, 19, e1011699. [Google Scholar] [CrossRef] [PubMed]
  14. Zhang, J.; Ren, L.; Zhang, L.; Gong, Y.; Xu, T.; Wang, X.; Guo, C.; Zhai, L.; Yu, X.; Li, Y.; et al. Single-cell rapid identification, in situ viability and vitality profiling, and genome-based source-tracking for probiotics products. Imeta 2023, 2, e117. [Google Scholar] [CrossRef] [PubMed]
  15. Xu, L.; Lu, Z.-M.; Zhang, X.-J.; Chai, L.-J.; Wang, S.-T.; Zhang, S.-Y.; Shen, C.-H.; Li, B.; Shi, J.-S.; Xu, Z.-H. Metabolic Activity Profiling of High-Temperature Daqu Microbiota Using Single-Cell Raman Spectroscopy and Deuterium Isotope Probing. Anal. Chem. 2025, 97, 18199–18207. [Google Scholar] [CrossRef]
  16. Li, X.; Li, S.; Wu, Q. Non-Invasive Detection of Biomolecular Abundance from Fermentative Microorganisms via Raman Spectra Combined with Target Extraction and Multimodel Fitting. Molecules 2023, 29, 157. [Google Scholar] [CrossRef]
  17. Sun, Z.; Wang, Z.; Jiang, M. RamanCluster: A deep clustering-based framework for unsupervised Raman spectral identification of pathogenic bacteria. Talanta 2024, 275, 126076. [Google Scholar] [CrossRef]
  18. Arslan, A.H.; Ciloglu, F.U.; Yilmaz, U.; Simsek, E.; Aydin, O. Discrimination of waterborne pathogens, Cryptosporidium parvum oocysts and bacteria using surface-enhanced Raman spectroscopy coupled with principal component analysis and hierarchical clustering. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2022, 267, 120475. [Google Scholar] [CrossRef]
  19. Golubewa, L.; Timoshchenko, I.; Kulahava, T. Specificity of carbon nanotube accumulation and distribution in cancer cells revealed by k-means clustering and principal component analysis of Raman spectra. Analyst 2024, 149, 2680–2696. [Google Scholar] [CrossRef]
  20. Dapkevicius, M.D.L.E.; Sgardioli, B.; Câmara, S.P.; Poeta, P.; Malcata, F.X. Current trends of enterococci in dairy products: A comprehensive review of their multiple roles. Foods 2021, 10, 821. [Google Scholar] [CrossRef]
  21. Amidi-Fazli, N.; Hanifian, S. Biodiversity, antibiotic resistance and virulence traits of Enterococcus species in artisanal dairy products. Int. Dairy J. 2022, 129, 105287. [Google Scholar] [CrossRef]
  22. Wang, K.; Chen, J.; Martiniuk, J.; Ma, X.; Li, Q.; Measday, V.; Lu, X. Species identification and strain discrimination of fermentation yeasts Saccharomyces cerevisiae and Saccharomyces uvarum using Raman spectroscopy and convolutional neural networks. Appl. Environ. Microbiol. 2023, 89, e01673-23. [Google Scholar] [CrossRef] [PubMed]
  23. Nagy, S.Á.; Makrai, L.; Csabai, I.; Tőzsér, D.; Szita, G.; Solymosi, N. Bacterial colony size growth estimation by deep learning. BMC Microbiol. 2023, 23, 307. [Google Scholar] [CrossRef] [PubMed]
  24. Meng, H.; Jiang, Y.; Wang, L.; Wang, S.; Zhang, Z.; Tong, X.; Wang, S. Effects of different soybean and maize mixed proportions in a strip intercropping system on silage fermentation quality. Fermentation 2022, 8, 696. [Google Scholar] [CrossRef]
  25. Lan, T.; Lv, X.; Zhao, Q.; Lei, Y.; Gao, C.; Yuan, Q.; Sun, X.; Liu, X.; Ma, T. Optimization of strains for fermentation of kiwifruit juice and effects of mono- and mixed culture fermentation on its sensory and aroma profiles. Food Chem. X 2023, 17, 100595. [Google Scholar] [CrossRef]
  26. Liu, M.; Mu, J.; Gong, W.; Zhang, K.; Yuan, M.; Song, Y.; Li, B.; Jin, N.; Zhang, W.; Zhang, D. In vitro diagnosis and visualization of cerebral ischemia/reperfusion injury in rats and protective effects of ferulic acid by raman biospectroscopy and machine learning. ACS Chem. Neurosci. 2022, 14, 159–169. [Google Scholar] [CrossRef]
  27. Li, S.; Li, X.; Yu, M. A Subpixel Calibration Strategy for Micro-Raman Spectrometer in Subvisible Particles Traceable Detection. IEEE Trans. Instrum. Meas. 2024, 73, 3436109. [Google Scholar] [CrossRef]
  28. Jiang, X.; Li, F.; Wang, Q.; Luo, J.; Hao, J.; Xu, M. Baseline correction method based on improved adaptive iteratively reweighted penalized least squares for the X-ray fluorescence spectrum. Appl. Opt. 2021, 60, 5707–5715. [Google Scholar] [CrossRef]
  29. John, A.; Sadasivan, J.; Seelamantula, C.S. Adaptive Savitzky-Golay filtering in non-Gaussian noise. IEEE Trans. Signal Process. 2021, 69, 5021–5036. [Google Scholar] [CrossRef]
  30. Li, X. Colony De-Replication Framework (Version 1.0) [Python 3.6]. Available online: https://github.com/lixinli1230-commits/kmeans-HCA (accessed on 10 October 2025).
  31. Spedalieri, C.; Plaickner, J.; Speiser, E.; Esser, N.; Kneipp, J. Ultraviolet resonance raman spectra of serum albumins. Appl. Spectrosc. 2023, 77, 1044–1052. [Google Scholar] [CrossRef]
  32. Zhu, G.; Zhu, X.; Fan, Q.; Wan, X. Raman spectra of amino acids and their aqueous solutions. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2011, 78, 1187–1195. [Google Scholar] [CrossRef]
  33. Talari, A.C.S.; Movasaghi, Z.; Rehman, S.; Rehman, I.U. Raman spectroscopy of biological tissues. Appl. Spectrosc. Rev. 2015, 50, 46–111. [Google Scholar] [CrossRef]
Figure 1. Workflow for de-redundant colony selection using Raman spectroscopy and hybrid clustering.
Figure 1. Workflow for de-redundant colony selection using Raman spectroscopy and hybrid clustering.
Fermentation 11 00691 g001
Figure 2. Representative colony images of lactic acid bacteria and enterococci.
Figure 2. Representative colony images of lactic acid bacteria and enterococci.
Fermentation 11 00691 g002
Figure 3. Preprocessing effects on Lactococcus lactis Raman spectra: (a) Spectral means and variance bands for raw vs. preprocessed spectra, with transparent color bands added at the dominant Raman peak positions (560–1655 cm−1) to visually highlight key biochemical features. The corresponding alphabetic labels are: A (≈560 cm−1, lipopolysaccharides C–O–C), B (≈660 cm−1, tyrosine/phenylalanine), C (≈740 cm−1, O–P–O stretching of pyrimidine and uracil), D (≈870 cm−1, tyrosine vibration), E (≈922 cm−1, Protein band C–C)), F (≈1004 cm−1, Phenylalanine), G (≈1240 cm−1, cytosine and thymine), H (≈1320 cm−1, protein amide III), I (≈1450 cm−1, CH2 groups of proteins/lipids), and J (≈1655 cm−1, amide I CO–NH stretching). Each transparent band (A–J) indicates its assigned biochemical mode and corresponds to a region where signal-to-noise was notably enhanced after preprocessing. (b) Distributions of characteristic spectral signal-to-noise ratios (CS_SNR) before and after preprocessing, showing a consistent shift toward higher SNR values and an average improvement of 9.1 across the dominant bands.
Figure 3. Preprocessing effects on Lactococcus lactis Raman spectra: (a) Spectral means and variance bands for raw vs. preprocessed spectra, with transparent color bands added at the dominant Raman peak positions (560–1655 cm−1) to visually highlight key biochemical features. The corresponding alphabetic labels are: A (≈560 cm−1, lipopolysaccharides C–O–C), B (≈660 cm−1, tyrosine/phenylalanine), C (≈740 cm−1, O–P–O stretching of pyrimidine and uracil), D (≈870 cm−1, tyrosine vibration), E (≈922 cm−1, Protein band C–C)), F (≈1004 cm−1, Phenylalanine), G (≈1240 cm−1, cytosine and thymine), H (≈1320 cm−1, protein amide III), I (≈1450 cm−1, CH2 groups of proteins/lipids), and J (≈1655 cm−1, amide I CO–NH stretching). Each transparent band (A–J) indicates its assigned biochemical mode and corresponds to a region where signal-to-noise was notably enhanced after preprocessing. (b) Distributions of characteristic spectral signal-to-noise ratios (CS_SNR) before and after preprocessing, showing a consistent shift toward higher SNR values and an average improvement of 9.1 across the dominant bands.
Fermentation 11 00691 g003
Figure 4. Balanced pure-colony dataset: (a) t-SNE visualization of seven representative strains showing clear segregation among clusters; (b) Stacked Raman spectra (mean ± SD) of the seven strains, highlighting species-specific intensity variations among major biochemical bands (560–1655 cm−1) and low within-strain variance (mean SD ≈ 0.06); (c) HCA clustering; (d) k-means clustering. Insets: WCI trends at w = 0.5. C: 1–8, cluster IDs.
Figure 4. Balanced pure-colony dataset: (a) t-SNE visualization of seven representative strains showing clear segregation among clusters; (b) Stacked Raman spectra (mean ± SD) of the seven strains, highlighting species-specific intensity variations among major biochemical bands (560–1655 cm−1) and low within-strain variance (mean SD ≈ 0.06); (c) HCA clustering; (d) k-means clustering. Insets: WCI trends at w = 0.5. C: 1–8, cluster IDs.
Fermentation 11 00691 g004
Figure 5. Imbalanced pure-colony dataset: (a) t-SNE visualization of seven strains under uneven abundance conditions; (b) Stacked Raman spectra (mean ± SD) showing overall consistency with balanced data (Figure 4b) and moderately higher spectral variance for low-abundance strains; (c) HCA clustering; (d) k-means clustering. Insets: WCI trends at w = 0.5. C: 1–9, cluster IDs.
Figure 5. Imbalanced pure-colony dataset: (a) t-SNE visualization of seven strains under uneven abundance conditions; (b) Stacked Raman spectra (mean ± SD) showing overall consistency with balanced data (Figure 4b) and moderately higher spectral variance for low-abundance strains; (c) HCA clustering; (d) k-means clustering. Insets: WCI trends at w = 0.5. C: 1–9, cluster IDs.
Fermentation 11 00691 g005
Figure 6. Mixed-plate results and recommended picks: (a) HCA clustering; (b) k-means clustering; (c) spatial visualization of recommended colonies across plates A and B (combined dataset, 113 colonies). Colors denote distinct strain clusters confirmed by 16S rRNA sequencing, while markers indicate selected representative colonies. C: cluster IDs.
Figure 6. Mixed-plate results and recommended picks: (a) HCA clustering; (b) k-means clustering; (c) spatial visualization of recommended colonies across plates A and B (combined dataset, 113 colonies). Colors denote distinct strain clusters confirmed by 16S rRNA sequencing, while markers indicate selected representative colonies. C: cluster IDs.
Fermentation 11 00691 g006
Table 1. Species detection across three picking strategies.
Table 1. Species detection across three picking strategies.
PlateTotal
Colonies
Species CoveredColonies PickedActual Number of Species Identified
RandomImage + ClusteringRaman + Clustering
A4610124710
B671013578
C1131014348
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, X.; Liu, M.; Sun, J.; Wang, S. An Intelligent Strategy for Colony De-Replication Using Raman Spectroscopy and Hybrid Clustering. Fermentation 2025, 11, 691. https://doi.org/10.3390/fermentation11120691

AMA Style

Li X, Liu M, Sun J, Wang S. An Intelligent Strategy for Colony De-Replication Using Raman Spectroscopy and Hybrid Clustering. Fermentation. 2025; 11(12):691. https://doi.org/10.3390/fermentation11120691

Chicago/Turabian Style

Li, Xinli, Mingyang Liu, Jiaqi Sun, and Su Wang. 2025. "An Intelligent Strategy for Colony De-Replication Using Raman Spectroscopy and Hybrid Clustering" Fermentation 11, no. 12: 691. https://doi.org/10.3390/fermentation11120691

APA Style

Li, X., Liu, M., Sun, J., & Wang, S. (2025). An Intelligent Strategy for Colony De-Replication Using Raman Spectroscopy and Hybrid Clustering. Fermentation, 11(12), 691. https://doi.org/10.3390/fermentation11120691

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop