Erimin: A Pipeline to Identify Bacterial Strain Specific Primers

Tsifintaris, Margaritis; Koutra, Paraskevi; Tsiartas, Pavlos; Repanas, Panagiotis; Touliopoulos, Sotirios; Nelios, Grigorios; Anastasiadou, Anastasia; Tamouridou, Georgia; Nikolaou, Anastasios; Tsochantaridis, Ilias

doi:10.3390/dna6010011

Open AccessArticle

Erimin: A Pipeline to Identify Bacterial Strain Specific Primers

by

Margaritis Tsifintaris

^1,*

,

Paraskevi Koutra

¹,

Pavlos Tsiartas

¹,

Panagiotis Repanas

¹,

Sotirios Touliopoulos

¹

,

Grigorios Nelios

¹

,

Anastasia Anastasiadou

^1,2,

Georgia Tamouridou

³,

Anastasios Nikolaou

^1,4

and

Ilias Tsochantaridis

¹

Department of Molecular Biology & Genetics, Democritus University of Thrace, 68100 Alexandroupolis, Greece

²

Department of Medicine, Democritus University of Thrace, 68100 Alexandroupolis, Greece

³

School of Humanities, Hellenic Open University, 26335 Patras, Greece

⁴

Department of Agricultural Development, Democritus University of Thrace, 68200 Orestiada, Greece

^*

Author to whom correspondence should be addressed.

DNA 2026, 6(1), 11; https://doi.org/10.3390/dna6010011

Submission received: 29 October 2025 / Revised: 21 December 2025 / Accepted: 26 January 2026 / Published: 25 February 2026

Download

Browse Figures

Versions Notes

Abstract

Background/Objectives: Strain-level detection of bacteria is essential for applications such as diagnostics, food safety, and microbial monitoring. While 16S rRNA gene sequencing provides genus- or species-level resolution, it cannot reliably discriminate closely related strains. Whole-genome sequencing (WGS) offers high-resolution strain differentiation but remains impractical for routine detection due to cost and analytical complexity. This study aims to enable the translation of WGS data into accurate and cost-effective strain-specific PCR assays. Methods: We developed Erimin, a modular, shell-based bioinformatics pipeline for the automated identification of strain-specific genomic regions from short-read WGS data. Erimin systematically analyzes all available reference genomes for a given bacterial species in combination with sequencing data from a target strain. The workflow integrates reference-based read alignment, extraction of unmapped reads, de novo assembly, contig filtering and validation, genome annotation, and in silico PCR primer design and specificity evaluation. Results: Erimin was applied to Lactiplantibacillus pentosus whole-genome sequencing data to identify genomic regions specific to strain L33 through comparative analysis against a comprehensive set of reference genome assemblies representing multiple Lactiplantibacillus species. These regions were used for in silico PCR primer design and computational specificity assessment against non-target bacterial genomes, supporting discrimination of closely related strains. Conclusions: Erimin provides a structured computational approach for identifying strain-specific genomic regions from WGS data and for supporting the in silico design of PCR primers. This framework facilitates strain-level discrimination using targeted molecular assays.

Keywords:

strain-specific primers; comparative genomics; PCR assay development; Whole-Genome Sequencing (WGS); bioinformatics pipeline

Graphical Abstract

1. Introduction

The precise identification of bacterial strains has become increasingly critical in clinical diagnostics, microbial ecology, and industrial microbiology [1]. In clinical diagnostics, accurate strain-level identification not only underpins the rapid and reliable diagnosis of infections but also guides effective treatment decisions, supporting both infection control and public health efforts [2]. Moreover, it enables the detection of emerging or drug-resistant pathogens, facilitates epidemiological surveillance and improves patient outcomes by ensuring timely and targeted interventions [3]. In addition, rapid microbial identification plays a crucial role in preventing the spread of infections, promoting appropriate antibiotic use and reducing healthcare costs by minimizing unnecessary treatments and the duration of hospitalization [4].

Beyond healthcare, accurate identification of specific bacterial strains is equally important across a wide range of industrial applications. In biotechnology and food production, this enables the engineering of bacterial strains to optimize fermentation processes, increase yield, and enhance the production of valuable compounds such as enzymes, biofuels, and pharmaceuticals, while also ensuring the safety and consistency of fermented products [5]. In addition, it supports the development of beneficial strains, such as lactic acid bacteria with improved health-promoting and industrial properties [6]. Among beneficial microorganisms, probiotic bacteria are particularly notable because they exhibit strain-specific biological properties, including immunoregulatory functions, antimicrobial effects, and antitumor potential [7]. Consequently, these characteristics underscore the critical importance of accurate strain-level identification for ensuring quality assurance in probiotic formulations and functional food products. Moreover, in this context, regulatory frameworks require the labeling and monitoring of specific strains as well as verification of their viability throughout the production and distribution chain [8,9]. Overall, these insights highlight that the ability to precisely target and distinguish microbial strains is fundamental, as it drives improvements in clinical diagnostics, infection control and food safety, making it a foundational tool in modern microbiology and biotechnology.

Historically, molecular approaches such as multilocus sequence typing (MLST), pulsed-field gel electrophoresis (PFGE), and random amplified polymorphic DNA (RAPD) analysis have been employed to characterize bacterial isolates [10,11,12]. Although these techniques have provided valuable insights, they often lack the resolution required to discriminate closely related strains and are limited by their labor-intensive nature and reproducibility issues. These limitations can limit the accurate tracking of transmission routes in outbreaks or the identification of microevolutionary changes within pathogen populations [13]. Furthermore, these traditional methods require extensive sample processing, gel electrophoresis and manual interpretation of band patterns, which can introduce subjectivity. Additionally, the reproducibility of results across different laboratories can also be variable due to differences in protocols, equipment, and operator expertise, complicating data comparison [14,15].

The growing availability of high-quality whole-genome sequencing (WGS) data has transformed microbial taxonomy and enabled novel opportunities for high-resolution typing. By leveraging genome-based approaches, it is now possible to identify strain-specific genomic signatures, which can serve as targets for highly selective PCR assays. This shift toward comparative genomics has established the basis for the design of molecular tools with improved specificity and broader applicability across complex biological samples [16]. To address the growing need for automated and reproducible primer design workflows based on genome-wide data, we developed Erimin, a modular computational pipeline for the in silico identification of unique genomic regions and the design of strain-specific PCR primers. The Erimin pipeline integrates reference-based alignment, filtering of non-conserved reads, de novo assembly, annotation and assessment of primer candidates to ensure specificity and functional performance. Erimin was applied to Lactiplantibacillus pentosus strains that possess well-documented probiotic attributes and available genome sequences. The results demonstrate that Erimin facilitates the development of reliable molecular assays for differentiating closely related bacterial strains can be extended to other taxa with accessible WGS data.

2. Materials and Methods

2.1. Pipeline Implementation

The Erimin pipeline was developed as a shell-based workflow implemented in Bash for the identification of strain-specific genomic regions from short-read sequencing data and is designed to run in Unix/Linux environments. The pipeline integrates several widely used bioinformatics tools, including Bowtie2 2.5.4 [17] for short-read alignment, SAMtools 1.23 [18] for alignment processing, SPAdes 4.2.0 [19] for de novo genome assembly, BEDtools 2.31.1 [20] for genomic interval analysis and NCBI BLASTn 2.14.1 for sequence similarity searches against reference databases. The workflow is modular and all software dependencies must be installed prior to execution. Comprehensive documentation, installation instructions and source code are publicly available on GitHub (https://github.com/Mtsif/Erimin, accessed on 28 October 2025). A schematic representation of the Erimin pipeline, illustrating the sequential processing steps, input requirements, intermediate outputs, and final results, is depicted in Figure 1.

2.2. Reference Genome Indexing and Read Alignment

The pipeline requires as input a reference genome in FASTA format and paired-end short-read data in FASTQ format. The reference genome is first indexed to facilitate efficient mapping. Subsequently, paired-end reads are aligned to the reference using Bowtie2 v2.5.4, allowing identification of sequences absent or divergent from the reference. Unaligned read pairs are subsequently extracted from the alignment file using SAMtools v1.23, reformatted into FASTQ files and separated into forward and reverse reads to preserve orientation. These unmapped reads are then used for de novo assembly.

2.3. De Novo Assembly and Contig Selection

Unmapped read pairs obtained from the previous step are then assembled using SPAdes v4.2.0 with paired-end mode enabled, generating contigs that represent/correspond to genomic regions absent from the reference genome. To ensure suitability for downstream applications, such as primer design, only contigs longer than 1000 bp are retained. This length threshold minimizes non-specific amplification and prioritizes high-confidence unique sequences. Contig filtering is performed through shell scripting and regular-expression parsing of SPAdes output headers.

2.4. Design of Strain-Specific Primers

Subsequent to de novo assembly and contig selection, the resulting contigs were evaluated for sequence uniqueness through BLASTn 2.14.1 searches against both the RefSeq Genome and Nucleotide Collection (nt) databases hosted by NCBI in order to develop strain-specific molecular markers. Searches were conducted using the taxonomic filter “Bacteria (taxid:2)” and the “Somewhat similar sequences (blastn)” program setting. Contigs were prioritized for primer design based on their alignment characteristics: regions displaying low similarity scores (<40) to sequences from other bacterial species and high scores (≥200) for the target strains were considered putatively unique.

To further confirm the genomic context of these candidate regions, contigs were annotated using Prokka v1.15.6 [15] for general genome feature identification and PHASTER [21] for prophage-related sequence detection. Primer design was subsequently performed in Primer-BLAST [17], using the unique genomic sequences as templates. The following parameters were applied: maximum melting temperature (Tm) difference of 1 °C, database set to RefSeq representative genomes, organism filter set to Bacteria (taxid:2) and primer specificity stringency requiring a minimum of six total mismatches to unintended targets, with at least five mismatches within the final five bases at the 3′ end. Potential off-targets with ≥9 mismatches were ignored. Primer size and GC content were constrained to 20–25 bp and 45–55%, respectively. Primer specificity was further validated by BLAST comparison against both the nr and RefSeq representative genomes databases.

3. Results

3.1. Genome Availability and Taxonomic Landscape of Studied Microorganisms

To contextualize the genomic data landscape relevant to strain-level analysis, we first summarized the taxonomic distribution of microorganisms that have been extensively investigated in the literature and for which phenotypic or functional studies are available. Figure 2 presents the top 20 microorganisms most frequently represented in public datasets at both the genus and species levels, based on entries retrieved from the Probio-Ichnos database [22]. At the genus level, organisms belonging to lactic acid bacteria–associated groups dominate the landscape, with Lactiplantibacillus, Lacticaseibacillus, Lactobacillus, and Limosilactobacillus accounting for the majority of reported entries. A similar pattern is observed at the species level, where Lactiplantibacillus plantarum, Lacticaseibacillus rhamnosus, and Limosilactobacillus fermentum emerge as the most frequently studied species. These distributions reflect both the historical focus and industrial relevance of these taxa in probiotic research. Building on this taxonomic overview, we next assessed the availability of corresponding whole-genome sequencing data for microorganisms studied for probiotic potential. A comprehensive catalog of 789 publicly available bacterial genome assemblies was compiled and is presented in Table S1, linking genome accessions to species annotations and associated PubMed identifiers. This table summarizes all currently available genome datasets corresponding to microorganisms represented in the literature. Importantly, while many frequently studied microorganisms are supported by publicly available genome assemblies, genome sequencing data are not yet available for all reported taxa. This discrepancy highlights existing gaps between experimental characterization and genomic resource availability. Nevertheless, the breadth of available genome datasets provides a robust and scalable foundation for systematic, genome-driven analyses using the Erimin pipeline.

3.2. Application of the Erimin Pipeline to a Lactiplantibacillus Strain

To establish a robust comparative framework for the identification of strain-specific genomic regions using the Erimin pipeline, a custom reference panel was constructed, comprising all available L. pentosus genomes from the NCBI GenBank database with an assembly level of “scaffold” or higher. WGS data of L. pentosus strain L33 were obtained from the European Nucleotide Archive (sample accession: SAMN19185297), consisting of paired-end Illumina reads with a total data volume of approximately 2.5 GB. The dataset provided high coverage suitable for downstream reference-guided and de novo comparative analyses. In total, 48 L. pentosus assemblies were retrieved and supplemented with 20 reference genomes representing other Lactiplantibacillus taxa, while the L33 sequence itself was explicitly excluded from the panel to avoid self-mapping bias (Table S2). The resulting reference dataset comprised 68 genome assemblies with a combined size of approximately 234 MB, covering a broad phylogenetic range within the Lactiplantibacillus clade. The pipeline was applied to the L. pentosus L33 whole-genome sequencing dataset using the composite reference database described above (Figure 3).

The Bowtie2 indexing of the reference dataset completed in approximately 5 min, followed by the read alignment phase which required around 30 min to process the 8.8 million paired-end reads. The mapping achieved an overall alignment rate of 97.53%, indicating substantial genomic similarity between L. pentosus L33 and other Lactiplantibacillus members. After the alignment, a subset of approximately 214 thousand unmapped reads was extracted for downstream analysis, representing potential strain-specific sequences absent from the reference assemblies. These unmapped reads were subsequently assembled de novo using SPAdes, reconstructing contiguous sequences unique to L. pentosus L33. After quality filtration and exclusion of contigs shorter than 1000 bp, the assembly yielded 19 high-confidence contigs exceeding this threshold. Contigs exceeding this threshold were considered candidate genomic regions putatively specific to L. pentosus L33 and were selected for subsequent BLASTn-based validation and primer-design analysis.

To confirm the uniqueness of the de novo-assembled sequences, the 19 filtered contigs (>1 kb) were subjected to BLASTn 2.14.1 analysis against the NCBI RefSeq Genome database, restricted to Bacteria (taxid:2) with an expectation threshold of 1 × 10⁻⁵. The alignment output was examined for the presence of significant hits outside the L. pentosus taxon.

For all contigs, the top BLAST hit corresponded to L. pentosus strain L33, exhibiting nearly 100% query coverage and sequence identity, confirming that the reconstructed sequences faithfully represent authentic genomic regions rather than assembly artifacts. Subsequent hits showed progressively reduced percentages of identity and coverage, typically associated with other Lactiplantibacillus species, supporting the distinctiveness of the L33-specific contigs identified through the Erimin pipeline.

Functional annotation of the 19 validated contigs was performed using Prokka, which identified a total of 59 coding sequences (CDSs) and one tRNA-Ile (Anticodon TAT). Among these annotated features, two CDSs corresponded to transposase-related proteins—specifically an IS30 family transposase IS1252 and an ISL3 family transposase ISP1—indicating the presence of residual mobile genetic elements within the unique genomic regions. The remaining 57 CDSs were classified as hypothetical proteins, lacking functional assignments in existing databases and potentially representing novel or strain-specific loci.

To further evaluate the potential presence of viral or prophage sequences, PHASTER analysis was performed on a subset of 10 contigs exceeding 1.5 kb in length, as shorter fragments do not meet the tool’s minimum input requirement. Among these, 8 contigs showed no evidence of phage-related sequences, whereas 2 contained incomplete prophage regions, indicating a limited integration of viral elements within the identified strain-specific sequences. Based on the validated and annotated contigs, the identified strain-specific sequences provide suitable templates for in silico primer development. Considering their confirmed uniqueness and biological relevance, these regions can be directly used in Primer-BLAST to generate and evaluate primer pairs specific to L. pentosus L33. Such primers are expected to support future development of strain-level molecular assays following experimental validation.

3.3. Computational Benchmarking and Performance Evaluation

To assess the computational efficiency of the Erimin pipeline, all analyses were executed on a workstation equipped with an Intel^® Core™ i3-10100 CPU @ 3.60 GHz (8 cores) and 64 GB RAM, operating under a standard Linux environment. The benchmarking was performed using the Lactiplantibacillus pentosus L33 whole-genome sequencing dataset (≈2.5 GB; 8.8 million paired-end Illumina reads) [23] and the composite reference database comprising 68 genome assemblies (≈234 MB total size). The overall runtime for the complete Erimin workflow was approximately 38 min. The Bowtie2 indexing of the reference dataset completed in ~5 min, while the read alignment phase required ~30 min, achieving an overall alignment rate of 97.53%. Extraction of unmapped reads and de novo assembly using SPAdes finalized in ~3 min, reconstructing 19 high-confidence contigs longer than 1 kb after quality filtering.

4. Discussion

The Erimin pipeline was developed to address the increasing need for reproducible approaches to identify strain-specific genomic regions suitable for targeted primer design. By combining multiple comparative genomics steps into a single modular framework, Erimin provides a reliable strategy for distinguishing closely related microbial strains using WGS data. The workflow relies on reference-guided comparisons followed by successive filtering steps to identify genomic loci that are absent or highly divergent in related genomes. This design supports high specificity in downstream molecular assays while reducing user-dependent variability.

An important feature of Erimin is its modular architecture, which allows users to adjust parameters, modify individual steps, or integrate alternative tools depending on dataset characteristics and computational resources. This flexibility facilitates application across diverse bacterial taxa and experimental contexts, while maintaining transparency and reproducibility. In addition, the pipeline is computationally efficient and can be executed on standard laboratory workstations, with performance largely determined by dataset size, genome complexity, and the composition of the reference genome set.

The effectiveness of genome-driven approaches such as Erimin is closely linked to the availability of high-quality public genome resources. As demonstrated in the Section 3, a substantial and expanding collection of publicly available bacterial genome assemblies exists for microorganisms that have been extensively investigated in the literature. These resources enable systematic strain-level analyses and supports the application of Erimin across diverse taxa. Conversely, the absence of genome assemblies for certain organisms highlights areas where additional sequencing efforts will further expand the scope and resolution of genome-informed assay development.

Conceptually similar methodological framework implemented in Erimin has been applied in previous studies involving lactic acid bacteria, including strains from the genus Lactiplantibacillus. In these studies, genome-based comparative approaches were used to identify strain-specific genomic regions that supported the design of strain-specific PCR primer sets, reflecting both the robustness of the approach and its biological relevance. Although the current implementation focuses on bacterial genomes, the underlying approach is applicable to other microbial groups, provided that suitable reference genome data are available. More broadly, accurate strain discrimination is increasingly important for linking microbial diversity to functional outcomes in complex biological systems, including environmental and agricultural contexts where microbial population size, biomass, and enzymatic activity influence ecosystem productivity [24]. Overall, Erimin provides a practical and efficient approach for strain discrimination in both research and applied microbiology. While its performance depends on the quality and completeness of available genome assemblies, continued advances in sequencing technologies and the expansion of public genome databases are expected to further improve its performance over time.

5. Conclusions

Accurate strain-level identification is essential for understanding and advancing our knowledge regarding the microbial diversity supporting both research and industrial applications. However, despite their widespread use, existing molecular methods often lack the precision needed to distinguish closely related strains, while WGS, although providing high-resolution bacterial typing, remains constrained by cost, infrastructure requirements and computational demands. The Erimin pipeline addresses these challenges by providing a cost-effective alternative for routine strain differentiation. By enabling the identification of strain-specific genomic regions and the rational design of targeted PCR primers, it facilitates rapid and precise detection of bacterial strains without the need for complex sequencing workflows, thereby making the approach accessible to laboratories with limited resources. In summary, by combining specificity, reproducibility, and computational efficiency, the Erimin pipeline provides a versatile framework for the development of affordable, genome-driven molecular assays. As public genome resources continue to expand, Erimin is well positioned to support reliable strain identification across diverse microbial systems in both research and applied settings.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/dna6010011/s1, Table S1: Complete catalog of publicly available bacterial genome datasets studied for probiotic potential; Table S2: Reference genome assemblies of Lactiplantibacillus species.

Author Contributions

Conceptualization, M.T. and I.T.; Methodology, M.T., P.K., P.T. and I.T.; Software, P.T., P.K. and M.T.; Validation, M.T. and I.T.; Formal Analysis, M.T., P.K., P.T., P.R., S.T., G.N., A.A., G.T., A.N. and I.T.; Investigation, M.T. and I.T.; Writing—Original Draft Preparation, M.T., P.K., P.T., P.R., S.T., G.N., A.A., G.T., A.N. and I.T.; Writing—Review and Editing, M.T., P.K., P.T., P.R., S.T., G.N., A.A., G.T., A.N. and I.T.; Project Administration, M.T. and I.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All materials supporting the findings of this study are available through GitHub at https://github.com/Mtsif/Erimin, accessed on 28 October 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Shropshire, W.C.; Hanson, B.M.; Shelburne, S.A. Genome-wide approaches to bacterial strain typing: A history and review of recent methodological advances. Curr. Opin. Infect. Dis. 2025, 38, 329–338. [Google Scholar] [CrossRef] [PubMed]
Arbefeville, S.S.; Timbrook, T.T.; Garner, C.D. Evolving strategies in microbe identification—A comprehensive review of biochemical, MALDI-TOF MS and molecular testing methods. J. Antimicrob. Chemother. 2024, 79, i2–i8. [Google Scholar] [CrossRef] [PubMed]
MacGowan, A.; Grier, S.; Stoddart, M.; Reynolds, R.; Rogers, C.; Pike, K.; Smartt, H.; Wilcox, M.; Wilson, P.; Kelsey, M.; et al. Impact of rapid microbial identification on clinical outcomes in bloodstream infection: The RAPIDO randomized trial. Clin. Microbiol. Infect. 2020, 26, 1347–1354. [Google Scholar] [CrossRef] [PubMed]
Sexton, M.E.; Jacob, J.T. Optimal use of Rapid Diagnostics in Infection Control and Prevention. Clin. Microbiol. Newsl. 2017, 39, 83–89. [Google Scholar] [CrossRef]
Mannan, A.A.; Darlington, A.P.S.; Tanaka, R.J.; Bates, D.G. Design principles for engineering bacteria to maximise chemical production from batch cultures. Nat. Commun. 2025, 16, 279. [Google Scholar] [CrossRef]
Kumar, A.; Bisht, A.; Maqsood, S.; Amjad, S.; Baghel, S.; Jaiswal, S.G.; Wei, S. The role of Micro-biome engineering in enhancing Food safety and quality. Biotechnol. Notes 2025, 6, 67–78. [Google Scholar] [CrossRef]
Sarita, B.; Samadhan, D.; Hassan, M.Z.; Kovaleva, E.G. A comprehensive review of probiotics and human health-current prospective and applications. Front. Microbiol. 2025, 15, 1487641. [Google Scholar] [CrossRef]
Elnar, A.G.; Eum, B.; Kim, G.B. Genomic characterization and probiotic assessment of Bifidobacterium breve JKL2022 with strain-specific CLA-converting properties. Sci. Rep. 2025, 15, 15419. [Google Scholar] [CrossRef]
Bubnov, R.V.; Babenko, L.P.; Lazarenko, L.M.; Mokrozub, V.V.; Spivak, M.Y. Specific properties of probiotic strains: Relevance and benefits for the host. EPMA J. 2018, 9, 205–223. [Google Scholar] [CrossRef]
Larsen, M.V.; Cosentino, S.; Rasmussen, S.; Friis, C.; Hasman, H.; Marvig, R.L.; Jelsbak, L.; Sicheritz-Pontén, T.; Ussery, D.W.; Aarestrup, F.M.; et al. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria. J. Clin. Microbiol. 2012, 50, 1355–1361. [Google Scholar] [CrossRef]
Stefańska, I.; Kwiecień, E.; Górzyńska, M.; Sałamaszyńska-Guz, A.; Rzewuska, M. RAPD-PCR-Based Fingerprinting Method as a Tool for Epidemiological Analysis of Trueperella pyogenes Infections. Pathogens 2022, 11, 562. [Google Scholar] [CrossRef] [PubMed]
Neoh, H.M.; Tan, X.E.; Sapri, H.F.; Tan, T.L. Pulsed-field gel electrophoresis (PFGE): A review of the “gold standard” for bacteria typing and current alternatives. Infect. Genet. Evol. 2019, 74, 103935. [Google Scholar] [CrossRef] [PubMed]
Smith, C.J.; Osborn, A.M. Advantages and limitations of quantitative PCR (Q-PCR)-based approaches in microbial ecology. FEMS Microbiol. Ecol. 2009, 67, 6–20. [Google Scholar] [CrossRef] [PubMed]
Abdelmalek, S.; Shokry, K.; Hamed, W.; Abdelnaser, M.; Aboubakr, A.; Elenin, S.A.; Ali, M.; Mostafa, M.; Abou-Okada, M. The validity evaluation of different 16srRNA gene primers for helicobacter detection urgently requesting to design new specific primers. Sci. Rep. 2022, 12, 10737. [Google Scholar] [CrossRef]
Hernández, I.; Sant, C.; Martínez, R.; Fernández, C. Design of Bacterial Strain-Specific qPCR Assays Using NGS Data and Publicly Available Resources and Its Application to Track Biocontrol Strains. Front. Microbiol. 2020, 11. [Google Scholar] [CrossRef]
Pham, V.D.; Simpson, D.J.; Gänzle, M.G. Strain-level identification of bacteria persisting in food and in food processing facilities: When do two isolates represent the same strain and which tools identify a strain? Curr. Opin. Food Sci. 2025, 61, 101245. [Google Scholar] [CrossRef]
Langmead, B.; Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 2012, 9, 357–359. [Google Scholar] [CrossRef]
Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25, 2078–2079. [Google Scholar] [CrossRef]
Bankevich, A.; Nurk, S.; Antipov, D.; Gurevich, A.A.; Dvorkin, M.; Kulikov, A.S.; Lesin, V.M.; Nikolenko, S.I.; Pham, S.; Prjibelski, A.D.; et al. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. J. Comput. Biol. 2012, 19, 455–477. [Google Scholar] [CrossRef]
Quinlan, A.R.; Hall, I.M. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 2010, 26, 841–842. [Google Scholar] [CrossRef]
Arndt, D.; Grant, J.R.; Marcu, A.; Sajed, T.; Pon, A.; Liang, Y.; Wishart, D.S. PHASTER: A better, faster version of the PHAST phage search tool. Nucleic Acids Res. 2016, 44, W16–W21. [Google Scholar] [CrossRef]
Tsifintaris, M.; Kiousi, D.E.; Repanas, P.; Kamarinou, C.S.; Kavakiotis, I.; Galanis, A. Probio-Ichnos: A Database of Microorganisms with In Vitro Probiotic Properties. Microorganisms 2024, 12, 1955. [Google Scholar] [CrossRef]
Stergiou, O.S.; Tegopoulos, K.; Kiousi, D.E.; Tsifintaris, M.; Papageorgiou, A.C.; Tassou, C.C.; Chorianopoulos, N.; Kolovos, P.; Galanis, A. Whole-Genome Sequencing, Phylogenetic and Genomic Analysis of Lactiplantibacillus pentosus L33, a Potential Probiotic Strain Isolated from Fermented Sausages. Front. Microbiol. 2021, 12, 746659. [Google Scholar] [CrossRef]
Koura, E.; Pistikoudi, A.; Tsifintaris, M.; Tsiolas, G.; Mouchtaropoulou, E.; Noutsos, C.; Karantakis, T.; Kouras, A.; Karanikolas, A.; Argiriou, A.; et al. The Effect of Phosphorus Fertilization on Transcriptome Expression Profile during Lentil Pod and Seed Development. Appl. Sci. 2023, 13, 11403. [Google Scholar] [CrossRef]

Figure 1. Overview of the Erimin workflow. (A) Input files; paired-end whole-genome sequencing reads and reference genome assemblies. (B) Read alignment and preprocessing steps used to separate mapped and unmapped reads. (C) De novo assembly of unmapped reads followed by contig length filtering. (D) Identification of candidate contigs through similarity searches against reference databases. (E) Extraction of contig intervals lacking coverage in reference genomes, highlighting genomic regions putatively unique to the target strain. (F) Downstream annotation and primer design steps, including contig annotation and integration of Primer-BLAST for the generation and evaluation of candidate PCR primer pairs.

Figure 2. Taxonomic distribution of microorganisms most frequently studied for probiotic potential at the genus and species levels.

Figure 3. Illustration of the Erimin workflow as applied to the L. pentosus L33 genome.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tsifintaris, M.; Koutra, P.; Tsiartas, P.; Repanas, P.; Touliopoulos, S.; Nelios, G.; Anastasiadou, A.; Tamouridou, G.; Nikolaou, A.; Tsochantaridis, I. Erimin: A Pipeline to Identify Bacterial Strain Specific Primers. DNA 2026, 6, 11. https://doi.org/10.3390/dna6010011

AMA Style

Tsifintaris M, Koutra P, Tsiartas P, Repanas P, Touliopoulos S, Nelios G, Anastasiadou A, Tamouridou G, Nikolaou A, Tsochantaridis I. Erimin: A Pipeline to Identify Bacterial Strain Specific Primers. DNA. 2026; 6(1):11. https://doi.org/10.3390/dna6010011

Chicago/Turabian Style

Tsifintaris, Margaritis, Paraskevi Koutra, Pavlos Tsiartas, Panagiotis Repanas, Sotirios Touliopoulos, Grigorios Nelios, Anastasia Anastasiadou, Georgia Tamouridou, Anastasios Nikolaou, and Ilias Tsochantaridis. 2026. "Erimin: A Pipeline to Identify Bacterial Strain Specific Primers" DNA 6, no. 1: 11. https://doi.org/10.3390/dna6010011

APA Style

Tsifintaris, M., Koutra, P., Tsiartas, P., Repanas, P., Touliopoulos, S., Nelios, G., Anastasiadou, A., Tamouridou, G., Nikolaou, A., & Tsochantaridis, I. (2026). Erimin: A Pipeline to Identify Bacterial Strain Specific Primers. DNA, 6(1), 11. https://doi.org/10.3390/dna6010011

Article Menu

Erimin: A Pipeline to Identify Bacterial Strain Specific Primers

Abstract

1. Introduction

2. Materials and Methods

2.1. Pipeline Implementation

2.2. Reference Genome Indexing and Read Alignment

2.3. De Novo Assembly and Contig Selection

2.4. Design of Strain-Specific Primers

3. Results

3.1. Genome Availability and Taxonomic Landscape of Studied Microorganisms

3.2. Application of the Erimin Pipeline to a Lactiplantibacillus Strain

3.3. Computational Benchmarking and Performance Evaluation

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI