MDPI - Publisher of Open Access Journals

23 pages, 2368 KB

Open AccessArticle

MitoGEx: An Integrated Platform for Streamlined Human Mitochondrial Genome Analysis

by Kongpop Jeenkeawpiam, Pemikar Srifa, Natakorn Nokchan, Natthapon Khongcharoen, Anas Binkasem and Surasak Sangkhathat

Genes 2026, 17(3), 338; https://doi.org/10.3390/genes17030338 - 18 Mar 2026

Viewed by 566

Abstract

Background/Objectives: Mitochondrial DNA (mtDNA) is an important resource for understanding human ancestry, population diversity, and the molecular mechanisms of mitochondrial diseases. However, analyzing mtDNA thoroughly often requires advanced bioinformatics skills and command-line knowledge. To address this challenge, we created Mitochondrial Genome Explorer [...] Read more.

Background/Objectives: Mitochondrial DNA (mtDNA) is an important resource for understanding human ancestry, population diversity, and the molecular mechanisms of mitochondrial diseases. However, analyzing mtDNA thoroughly often requires advanced bioinformatics skills and command-line knowledge. To address this challenge, we created Mitochondrial Genome Explorer (MitoGEx), a user-friendly computational pipeline optimized for human mtDNA analysis that combines multiple mtDNA analysis modules within a single graphical user interface. Methods: The platform simplifies key analytical steps, such as quality control, sequence alignment, alignment quality assessment, variant detection, haplogroup classification, and phylogenetic reconstruction. Users can choose between Quick and Advanced modes, which offer default settings or customizable options based on their analysis needs. To demonstrate its effectiveness, we analyzed 15 whole-exome sequencing (WES) samples from Songklanagarind Hospital using MitoGEx. Results: The sequencing data were of high quality, with over 92 percent of bases scoring above a Phred score and consistent GC content across all samples. Variant detection using the GATK mitochondrial pipeline and annotation with ANNOVAR and the MitImpact database revealed multiple high-confidence variants. Haplogroup classification with Haplogrep 3 and phylogenetic analysis with IQ-TREE 2 confirmed diverse maternal lineages within the cohort. Conclusions: Taken together, MitoGEx facilitates mitochondrial genome analysis in a reproducible and accessible manner for both research and clinical bioinformatics applications. The analytical results produced by MitoGEx are concordant with those obtained using standalone bioinformatic tools, demonstrating analytical correctness. By integrating all analysis steps into a single automated workflow, MitoGEx reduces execution time and limits human error inherent to manual, multi-step pipelines. Full article

(This article belongs to the Special Issue Molecular Basis in Rare Genetic Disorders)

► Show Figures

Figure 1

10 pages, 2295 KB

Open AccessArticle

Erimin: A Pipeline to Identify Bacterial Strain Specific Primers

by Margaritis Tsifintaris, Paraskevi Koutra, Pavlos Tsiartas, Panagiotis Repanas, Sotirios Touliopoulos, Grigorios Nelios, Anastasia Anastasiadou, Georgia Tamouridou, Anastasios Nikolaou and Ilias Tsochantaridis

DNA 2026, 6(1), 11; https://doi.org/10.3390/dna6010011 - 25 Feb 2026

Viewed by 817

Abstract

Background/Objectives: Strain-level detection of bacteria is essential for applications such as diagnostics, food safety, and microbial monitoring. While 16S rRNA gene sequencing provides genus- or species-level resolution, it cannot reliably discriminate closely related strains. Whole-genome sequencing (WGS) offers high-resolution strain differentiation but remains [...] Read more.

Background/Objectives: Strain-level detection of bacteria is essential for applications such as diagnostics, food safety, and microbial monitoring. While 16S rRNA gene sequencing provides genus- or species-level resolution, it cannot reliably discriminate closely related strains. Whole-genome sequencing (WGS) offers high-resolution strain differentiation but remains impractical for routine detection due to cost and analytical complexity. This study aims to enable the translation of WGS data into accurate and cost-effective strain-specific PCR assays. Methods: We developed Erimin, a modular, shell-based bioinformatics pipeline for the automated identification of strain-specific genomic regions from short-read WGS data. Erimin systematically analyzes all available reference genomes for a given bacterial species in combination with sequencing data from a target strain. The workflow integrates reference-based read alignment, extraction of unmapped reads, de novo assembly, contig filtering and validation, genome annotation, and in silico PCR primer design and specificity evaluation. Results: Erimin was applied to Lactiplantibacillus pentosus whole-genome sequencing data to identify genomic regions specific to strain L33 through comparative analysis against a comprehensive set of reference genome assemblies representing multiple Lactiplantibacillus species. These regions were used for in silico PCR primer design and computational specificity assessment against non-target bacterial genomes, supporting discrimination of closely related strains. Conclusions: Erimin provides a structured computational approach for identifying strain-specific genomic regions from WGS data and for supporting the in silico design of PCR primers. This framework facilitates strain-level discrimination using targeted molecular assays. Full article

► Show Figures

Graphical abstract

26 pages, 1591 KB

Open AccessReview

Targeted Next-Generation Sequencing in Drug-Resistant Tuberculosis: WHO Guidance and Practical Implementation Priorities

by Sungwon Jung

Biomedicines 2026, 14(1), 93; https://doi.org/10.3390/biomedicines14010093 - 2 Jan 2026

Cited by 2 | Viewed by 1850

Abstract

Targeted next-generation sequencing (tNGS) closes the gap between point-of-care rapid tests and phenotypic drug susceptibility testing (pDST) in drug-resistant tuberculosis (DR-TB). The 2025 World Health Organization (WHO) consolidated guidelines and the operational handbook place tNGS after initial automated nucleic acid amplification tests (aNAATs) [...] Read more.

Targeted next-generation sequencing (tNGS) closes the gap between point-of-care rapid tests and phenotypic drug susceptibility testing (pDST) in drug-resistant tuberculosis (DR-TB). The 2025 World Health Organization (WHO) consolidated guidelines and the operational handbook place tNGS after initial automated nucleic acid amplification tests (aNAATs) for the delivery of catalogue-linked molecular drug susceptibility testing (DST) for a broad drug panel, reserving whole-genome sequencing (WGS) and/or pDST for discordance resolution, confirmation, and surveillance. This review summarizes (i) the core tNGS principles and panel design; (ii) platform-specific workflows for Illumina and Nanopore, including direct-from-sample implementations and typical turnaround times; (iii) catalogue-based interpretation against the 2023 WHO Mycobacterium tuberculosis mutation catalogue, with emphasis on bedaquiline/clofazimine (BDQ/CFZ) resistance and management of uncertain variants; (iv) pooled accuracy and sources of genotype–phenotype discordance; and (v) practical requirements for bioinformatics, quality assurance/external quality assessment (QA/EQA), and standardized reporting. We summarize operational and economic considerations (throughput, batching, and network design) to clarify where tNGS adds value compared with alternative strategies and to outline priority research needs, including (i) performance standards for culture-free tNGS, (ii) robust heteroresistance detection, (iii) standardized variant curation, and (iv) data-sharing frameworks to refine genotype–phenotype links. When embedded within validated QA/EQA frameworks and catalogue-linked reporting systems, tNGS can shorten the time to effective therapy by rapidly informing fluoroquinolone (FQ) susceptibility and providing early, tiered resistance signals for newer agents (e.g., BDQ), with indeterminate findings prompting reflex pDST/WGS. Full article

(This article belongs to the Special Issue Mycobacterial Infections: Insights into Pathogenesis, Diagnosis, and Treatment)

► Show Figures

Figure 1

23 pages, 2391 KB

Open AccessArticle

High-Accuracy Chicken Breed Identification Using Microsatellite Genotype Data and AutoGluon Framework

by Rajaonarison Faniriharisoa Maxime Toky, Sutthisak Sukhamsri, Sadeep Medhasi, Trifan Budi, Thitipong Panthum, Worapong Singchat and Kornsorn Srikulnath

Biology 2026, 15(1), 21; https://doi.org/10.3390/biology15010021 - 22 Dec 2025

Cited by 1 | Viewed by 1100

Abstract

The practical applications of breed identification are numerous and diverse, and they include breed conservation and breeding program design. However, distinguishing between breeds remains challenging and costly, especially for phenotypically similar chicken populations. Continued research is necessary to develop more accessible and optimized [...] Read more.

The practical applications of breed identification are numerous and diverse, and they include breed conservation and breeding program design. However, distinguishing between breeds remains challenging and costly, especially for phenotypically similar chicken populations. Continued research is necessary to develop more accessible and optimized methodologies. To address these challenges, machine learning (ML) offers promising tools for analyzing complex genetic data. The capabilities of machine learning, especially the random forest (RF) model, to enhance various fields, including bioinformatics, have recently been demonstrated. In this study, microsatellite genotype data from 651 individuals across 30 chicken populations filtered from a larger initial dataset for consistency were used to classify breeds using an RF model. Cross-validation techniques, including 10-fold cross-validation and leave-one-out cross-validation, were employed to assess the performance of the model. The model performance was evaluated using metrics such as accuracy, Cohen’s Kappa, 95% confidence interval, and F1-score. Results showed that the RF model achieved a 95.38% accuracy on the testing dataset. Accuracies of 91.44% and 90.99% were observed for 10-fold cross-validation and leave-one-out cross-validation, respectively. It is believed that larger datasets will significantly improve outcomes for other breeds. Because of its generalizability, the trained model can serve as a straightforward and modern method for chicken breed determination using machine learning. This study demonstrates that ML, particularly automated approaches like AutoGluon, provides a robust and accessible framework for chicken breed identification using cost-effective microsatellite data. Full article

(This article belongs to the Section Bioinformatics)

► Show Figures

Figure 1

20 pages, 2529 KB

Open AccessArticle

NeXus: An Automated Platform for Network Pharmacology and Multi-Method Enrichment Analysis

by Teh Bee Ping, Mohammad Alia, Bintang Annisa Bagustari and Salah A. Alshehade

Int. J. Mol. Sci. 2025, 26(22), 11147; https://doi.org/10.3390/ijms262211147 - 18 Nov 2025

Cited by 1 | Viewed by 1940

Abstract

Network pharmacology is a powerful approach for studying complex drug–target interactions and biological pathways. However, existing tools often require extensive manual intervention and lack integrated analysis capabilities. Here, we present NeXus v1.2, an automated platform for network pharmacology and multi-method enrichment analysis including [...] Read more.

Network pharmacology is a powerful approach for studying complex drug–target interactions and biological pathways. However, existing tools often require extensive manual intervention and lack integrated analysis capabilities. Here, we present NeXus v1.2, an automated platform for network pharmacology and multi-method enrichment analysis including Gene Set Enrichment Analysis (GSEA) and Gene Set Variation Analysis (GSVA) that addresses these limitations. NeXus v1.2 enables the seamless integration of multi-layer biological relationships, handling complex interactions between genes, compounds, and plants while maintaining analytical rigor. The platform implements three enrichment methodologies: Over-Representation Analysis (ORA), GSEA, and GSVA, circumventing limitations associated with arbitrary threshold-based approaches. NeXus v1.2 was validated using multiple datasets spanning 111 to 10,847 genes, demonstrating robust scalability and performance across dataset sizes. The platform was initially tested using a representative dataset comprising 111 genes, 32 compounds, and 3 plants, showing consistent performance in processing various relationship patterns, including shared compounds between plants and multitargeted genes. The processing time for this dataset was 4.8 s with peak memory usage of 480 MB. Large-scale validation with datasets up to 10,847 genes confirmed scalability, with linear time complexity and completion times under 3 min. NeXus v1.2 automatically generates comprehensive visualizations, including network maps, enrichment analyses, and relationship patterns, while maintaining the biological context of interactions. The tool successfully processed and analyzed enrichment patterns across multiple functional domains, generating publication-quality visualization outputs at 300 DPI resolution. The platform demonstrated enhanced automation in handling incomplete relationship data and maintaining analytical integrity across different biological layers. Compared to manual workflows requiring 15–25 min, NeXus v1.2 reduced the analysis time to under 5 s (>95% reduction) while ensuring the comprehensive coverage of biological relationships. NeXus v1.2 provides improved automation and integration for network pharmacology analysis, offering an efficient and user-friendly platform for complex biological network analysis. Its modular architecture enables the future integration of AI technologies and expansion into various therapeutic applications. Full article

(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine: 3rd Edition)

► Show Figures

Graphical abstract

14 pages, 587 KB

Open AccessArticle

Detection of Clinically Significant BRCA Large Genomic Rearrangements in FFPE Ovarian Cancer Samples: A Comparative NGS Study

by Alessia Perrucci, Maria De Bonis, Giulia Maneri, Claudio Ricciardi Tenore, Paola Concolino, Matteo Corsi, Alessandra Conca, Jessica Evangelista, Alessia Piermattei, Camilla Nero, Luciano Giacò, Elisa De Paolis, Anna Fagotti and Angelo Minucci

Genes 2025, 16(9), 1052; https://doi.org/10.3390/genes16091052 - 8 Sep 2025

Cited by 1 | Viewed by 1600

Abstract

Background: Copy number variations (CNVs), also referred to as large genomic rearrangements (LGRs), represent a crucial component of BRCA1/2 (BRCA) testing. Next-generation sequencing (NGS) has become an established approach for detecting LGRs by combining sequencing data with dedicated bioinformatics pipelines. However, CNV detection [...] Read more.

Background: Copy number variations (CNVs), also referred to as large genomic rearrangements (LGRs), represent a crucial component of BRCA1/2 (BRCA) testing. Next-generation sequencing (NGS) has become an established approach for detecting LGRs by combining sequencing data with dedicated bioinformatics pipelines. However, CNV detection in formalin-fixed paraffin-embedded (FFPE) samples remains technically challenging, and there is the need to implement a robust and optimized analysis strategy for routine clinical practice. Methods: This study evaluated 40 FFPE ovarian cancer (OC) samples from patients undergoing BRCA testing. The performance of the amplicon-based NGS Diatech Myriapod^® NGS BRCA1/2 panel (Diatech Pharmacogenetics, Jesi, Italy) was assessed for its ability to detect BRCA CNVs and results were compared to two hybrid capture-based reference assays. Results: Among the 40 analyzed samples (17 CNV-positive and 23 CNV-negative for BRCA genes), the Diatech pipeline showed a good concordance with the reference method—all CNVs were correctly identified in 16 cases with good enough sequencing quality. Only one result was inconclusive due to low sequencing quality. Conclusions: These findings support the clinical utility of NGS-based CNV analysis in FFPE samples when combined with appropriate bioinformatics tools. Integrating visual inspection of CNV plots with automated CNV calling improves the reliability of CNV detection and enhances the interpretation of results from tumor tissue. Accurate CNV detection directly from tumor tissue may reduce the need for additional germline testing, thus shortening turnaround times. Nevertheless, blood-based testing remains mandatory to determine whether detected BRCA CNVs are of hereditary or somatic origin, particularly in cases with a strong clinical suspicion of inherited predisposition due to young age and a personal and/or family history of OC. Full article

(This article belongs to the Section Human Genomics and Genetic Diseases)

► Show Figures

Figure 1

11 pages, 974 KB

Open AccessEditor’s ChoiceArticle

Reversible Platelet Aggregation Induced by Low-Temperature Storage in Heparinized Whole Blood Samples

by Yuriko Hayashi, Manato Miyazaki, Ryusuke Kimura, Ririka Arai, Miu Takada, Ayuko Takahashi and Hirokazu Kimura

Hematol. Rep. 2025, 17(5), 42; https://doi.org/10.3390/hematolrep17050042 - 22 Aug 2025

Viewed by 2407

Abstract

Background/Objectives: Platelet counts can be affected by storage conditions, potentially leading to pseudothrombocytopenia. The present study aimed to investigate temperature-dependent changes in platelet counts and morphology in whole blood samples anticoagulated with heparin or EDTA. We also examined the molecular mechanism of [...] Read more.

Background/Objectives: Platelet counts can be affected by storage conditions, potentially leading to pseudothrombocytopenia. The present study aimed to investigate temperature-dependent changes in platelet counts and morphology in whole blood samples anticoagulated with heparin or EDTA. We also examined the molecular mechanism of cold-induced aggregation via integrin GPIIb/IIIa–fibrinogen interaction using established bioinformatics technologies (docking simulation). Methods: Peripheral blood was collected from healthy volunteers (n = 6) and treated with either heparin or EDTA. The samples were stored at 4 °C, room temperature, or incubated at 37 °C. Platelet counts were measured using an automated hematology analyzer. The morphology of various blood cells in smears was assessed using the May-Grünwald Giemsa staining method. Docking simulations using an available software (HADDOCK 2.4) were performed to evaluate integrin–fibrinogen binding at different temperatures. Results: In automated blood cell counting, platelet counts in heparinized blood were significantly decreased under low-temperature conditions (4 °C), but this decrease was restored to levels comparable to those at room temperature upon warming to 37 °C (p < 0.05). No significant changes were observed in EDTA-treated samples. Microscopical findings showed platelet aggregation only in heparinized samples at 4 °C, with normal morphology restored upon warming (37 °C). Docking simulations estimated stronger integrin GPIIb/IIIa–fibrinogen binding at 4 °C than at 37 °C (p = 0.0286), suggesting temperature-dependent enhancement of molecular interactions. Conclusions: These findings indicate that heparin can induce reversible platelet aggregation at low temperatures in whole blood samples, leading to pseudothrombocytopenia. This phenomenon may be mediated by increased integrin GPIIb/IIIa–fibrinogen binding. Full article

► Show Figures

Figure 1

22 pages, 1703 KB

Open AccessArticle

Towards Personalized Precision Oncology: A Feasibility Study of NGS-Based Variant Analysis of FFPE CRC Samples in a Chilean Public Health System Laboratory

by Eduardo Durán-Jara, Iván Ponce, Marcelo Rojas-Herrera, Jessica Toro, Paulo Covarrubias, Evelin González, Natalia T. Santis-Alay, Mario E. Soto-Marchant, Katherine Marcelain, Bárbara Parra and Jorge Fernández

Curr. Issues Mol. Biol. 2025, 47(8), 599; https://doi.org/10.3390/cimb47080599 - 30 Jul 2025

Viewed by 2202

Abstract

Massively parallel or next-generation sequencing (NGS) has enabled the genetic characterization of cancer patients, allowing the identification of somatic and germline variants associated with their diagnosis, tumor classification, and therapy response. Despite its benefits, NGS testing is not yet available in the Chilean [...] Read more.

Massively parallel or next-generation sequencing (NGS) has enabled the genetic characterization of cancer patients, allowing the identification of somatic and germline variants associated with their diagnosis, tumor classification, and therapy response. Despite its benefits, NGS testing is not yet available in the Chilean public health system, rendering it both costly and time-consuming for patients and clinicians. Using a retrospective cohort of 67 formalin-fixed, paraffin-embedded (FFPE) colorectal cancer (CRC) samples, we aimed to implement the identification, annotation, and prioritization of relevant actionable tumor somatic variants in our laboratory, as part of the public health system. We compared two different library preparation methodologies (amplicon-based and capture-based) and different bioinformatics pipelines for sequencing analysis to assess advantages and disadvantages of each one. We obtained 80.5% concordance between actionable variants detected in our analysis and those obtained in the Cancer Genomics Laboratory from the Universidad de Chile (62 out of 77 variants), a validated laboratory for this methodology. Notably, 98.4% (61 out of 62) of variants detected previously by the validated laboratory were also identified in our analysis. Then, comparing the hybridization capture-based library preparation methodology with the amplicon-based strategy, we found ~94% concordance between identified actionable variants across the 15 shared genes, analyzed by the TumorSec^TM bioinformatics pipeline, developed by the Cancer Genomics Laboratory. Our results demonstrate that it is entirely viable to implement an NGS-based analysis of actionable variant identification and prioritization in cancer samples in our laboratory, being part of the Chilean public health system and paving the way to improve the access to such analyses. Considering the economic realities of most Latin American countries, using a small NGS panel, such as TumorSec^TM, focused on relevant variants of the Chilean and Latin American population is a cost-effective approach to extensive global NGS panels. Furthermore, the incorporation of automated bioinformatics analysis in this streamlined assay holds the potential of facilitating the implementation of precision medicine in this geographic region, which aims to greatly support personalized treatment of cancer patients in Chile. Full article

(This article belongs to the Special Issue Linking Genomic Changes with Cancer in the NGS Era, 2nd Edition)

► Show Figures

Figure 1

36 pages, 1807 KB

Open AccessReview

Thriving or Withering? Plant Molecular Cytogenetics in the First Quarter of the 21st Century

by Elzbieta Wolny, Luis A. J. Mur, Nobuko Ohmido, Zujun Yin, Kai Wang and Robert Hasterok

Int. J. Mol. Sci. 2025, 26(14), 7013; https://doi.org/10.3390/ijms26147013 - 21 Jul 2025

Cited by 2 | Viewed by 3388

Abstract

Nearly four decades have passed since fluorescence in situ hybridisation was first applied in plants to support molecular cytogenetic analyses across a wide range of species. Subsequent advances in DNA sequencing, bioinformatic analysis, and microscopy, together with the immunolocalisation of various nuclear components, [...] Read more.

Nearly four decades have passed since fluorescence in situ hybridisation was first applied in plants to support molecular cytogenetic analyses across a wide range of species. Subsequent advances in DNA sequencing, bioinformatic analysis, and microscopy, together with the immunolocalisation of various nuclear components, have provided unprecedented insights into the cytomolecular organisation of the nuclear genome in both model and non-model plants, with crop species being perhaps the most significant. The ready availability of sequenced genomes is now facilitating the application of state-of-the-art cytomolecular techniques across diverse plant species. However, these same advances in genomics also pose a challenge to the future of plant molecular cytogenetics, as DNA sequence analysis is increasingly perceived as offering comparable insights into genome organisation. This perception persists despite the continued relevance of FISH-based approaches for the physical anchoring of genome assemblies to chromosomes. Furthermore, cytogenetic approaches cannot currently rival purely genomic methods in terms of throughput, standardisation, and automation. This review highlights the latest key topics in plant cytomolecular research, with particular emphasis on chromosome identification and karyotype evolution, chromatin and interphase nuclear organisation, chromosome structure, hybridisation and polyploidy, and cytogenetics-assisted crop improvement. In doing so, it underscores the distinctive contributions that cytogenetic techniques continue to offer in genomic research. Additionally, we critically assess future directions and emerging opportunities in the field, including those related to CRISPR/Cas-based live-cell imaging and chromosome engineering, as well as AI-assisted image analysis and karyotyping. Full article

(This article belongs to the Collection Feature Papers in Molecular Plant Sciences)

► Show Figures

Figure 1

18 pages, 1987 KB

Open AccessArticle

AI-HOPE-TGFbeta: A Conversational AI Agent for Integrative Clinical and Genomic Analysis of TGF-β Pathway Alterations in Colorectal Cancer to Advance Precision Medicine

by Ei-Wen Yang, Brigette Waldrup and Enrique Velazquez-Villarreal

AI 2025, 6(7), 137; https://doi.org/10.3390/ai6070137 - 24 Jun 2025

Cited by 8 | Viewed by 2463

Abstract

Introduction: Early-onset colorectal cancer (EOCRC) is rising rapidly, particularly among the Hispanic/Latino (H/L) populations, who face disproportionately poor outcomes. The transforming growth factor-beta (TGF-β) signaling pathway plays a critical role in colorectal cancer (CRC) progression by mediating epithelial-to-mesenchymal transition (EMT), immune evasion, and [...] Read more.

Introduction: Early-onset colorectal cancer (EOCRC) is rising rapidly, particularly among the Hispanic/Latino (H/L) populations, who face disproportionately poor outcomes. The transforming growth factor-beta (TGF-β) signaling pathway plays a critical role in colorectal cancer (CRC) progression by mediating epithelial-to-mesenchymal transition (EMT), immune evasion, and metastasis. However, integrative analyses linking TGF-β alterations to clinical features remain limited—particularly for diverse populations—hindering translational research and the development of precision therapies. To address this gap, we developed AI-HOPE-TGFbeta (Artificial Intelligence agent for High-Optimization and Precision Medicine focused on TGF-β), the first conversational artificial intelligence (AI) agent designed to explore TGF-β dysregulation in CRC by integrating harmonized clinical and genomic data via natural language queries. Methods: AI-HOPE-TGFbeta utilizes a large language model (LLM), Large Language Model Meta AI 3 (LLaMA 3), a natural language-to-code interpreter, and a bioinformatics backend to automate statistical workflows. Tailored for TGF-β pathway analysis, the platform enables real-time cohort stratification and hypothesis testing using harmonized datasets from the cBio Cancer Genomics Portal (cBioPortal). It supports mutation frequency comparisons, odds ratio testing, Kaplan–Meier survival analysis, and subgroup evaluations across race/ethnicity, microsatellite instability (MSI) status, tumor stage, treatment exposure, and age. The platform was validated by replicating findings on the SMAD4, TGFBR2, and BMPR1A mutations in EOCRC. Exploratory queries were conducted to examine novel associations with clinical outcomes in H/L populations. Results: AI-HOPE-TGFbeta successfully recapitulated established associations, including worse survival in SMAD4-mutant EOCRC patients treated with FOLFOX (fluorouracil, leucovorin and oxaliplatin) (p = 0.0001) and better outcomes in early-stage TGFBR2-mutated CRC patients (p = 0.00001). It revealed potential population-specific enrichment of BMPR1A mutations in H/L patients (OR = 2.63; p = 0.052) and uncovered MSI-specific survival benefits among SMAD4-mutated patients (p = 0.00001). Exploratory analysis showed better outcomes in SMAD2-mutant primary tumors vs. metastatic cases (p = 0.0010) and confirmed the feasibility of disaggregated ethnicity-based queries for TGFBR1 mutations, despite small sample sizes. These findings underscore the platform’s capacity to detect both known and emerging clinical–genomic patterns in CRC. Conclusions: AI-HOPE-TGFbeta introduces a new paradigm in cancer bioinformatics by enabling natural language-driven, real-time integration of genomic and clinical data specific to TGF-β pathway alterations in CRC. The platform democratizes complex analyses, supports disparity-focused investigation, and reveals clinically actionable insights in underserved populations, such as H/L EOCRC patients. As a first-of-its-kind system studying TGF-β, AI-HOPE-TGFbeta holds strong promise for advancing equitable precision oncology and accelerating translational discovery in the CRC TGF-β pathway. Full article

(This article belongs to the Section Medical & Healthcare AI)

► Show Figures

Figure 1

26 pages, 916 KB

Open AccessReview

Integrating Artificial Intelligence in Next-Generation Sequencing: Advances, Challenges, and Future Directions

by Konstantina Athanasopoulou, Vasiliki-Ioanna Michalopoulou, Andreas Scorilas and Panagiotis G. Adamopoulos

Curr. Issues Mol. Biol. 2025, 47(6), 470; https://doi.org/10.3390/cimb47060470 - 19 Jun 2025

Cited by 32 | Viewed by 7683

Abstract

The integration of artificial intelligence (AI) into next-generation sequencing (NGS) has revolutionized genomics, offering unprecedented advancements in data analysis, accuracy, and scalability. This review explores the synergistic relationship between AI and NGS, highlighting its transformative impact across genomic research and clinical applications. AI-driven [...] Read more.

The integration of artificial intelligence (AI) into next-generation sequencing (NGS) has revolutionized genomics, offering unprecedented advancements in data analysis, accuracy, and scalability. This review explores the synergistic relationship between AI and NGS, highlighting its transformative impact across genomic research and clinical applications. AI-driven tools, including machine learning and deep learning, enhance every aspect of NGS workflows—from experimental design and wet-lab automation to bioinformatics analysis of the generated raw data. Key applications of AI integration in NGS include variant calling, epigenomic profiling, transcriptomics, and single-cell sequencing, where AI models such as CNNs, RNNs, and hybrid architectures outperform traditional methods. In cancer research, AI enables precise tumor subtyping, biomarker discovery, and personalized therapy prediction, while in drug discovery, it accelerates target identification and repurposing. Despite these advancements, challenges persist, including data heterogeneity, model interpretability, and ethical concerns. This review also discusses the emerging role of AI in third-generation sequencing (TGS), addressing long-read-specific challenges, like fast and accurate basecalling, as well as epigenetic modification detection. Future directions should focus on implementing federated learning to address data privacy, advancing interpretable AI to improve clinical trust and developing unified frameworks for seamless integration of multi-modal omics data. By fostering interdisciplinary collaboration, AI promises to unlock new frontiers in precision medicine, making genomic insights more actionable and scalable. Full article

(This article belongs to the Special Issue Technological Advances Around Next-Generation Sequencing Application)

► Show Figures

Graphical abstract

12 pages, 882 KB

Open AccessEditor’s ChoiceArticle

mbX: An R Package for Streamlined Microbiome Analysis

by Utsav Lamichhane and Jeferson Lourenco

Stats 2025, 8(2), 44; https://doi.org/10.3390/stats8020044 - 29 May 2025

Cited by 2 | Viewed by 2992

Abstract

Here, we introduce the mbX package: an R-based tool designed to streamline 16S rRNA gene microbiome data analysis following taxonomic classification. It automates key post-sequencing steps, including taxonomic data cleaning and visualization, addressing the need for reproducible and user-friendly microbiome workflows. mbX’s core [...] Read more.

Here, we introduce the mbX package: an R-based tool designed to streamline 16S rRNA gene microbiome data analysis following taxonomic classification. It automates key post-sequencing steps, including taxonomic data cleaning and visualization, addressing the need for reproducible and user-friendly microbiome workflows. mbX’s core functions, ezclean and ezviz, take raw taxonomic output (such as those from QIIME 2) and sample metadata to produce a cleaned relative abundance dataset and high-quality stacked bar plots with minimal manual intervention. We validated mbX on 14 real microbiome datasets, demonstrating significant improvements in efficiency and consistency of post-processing of DNA sequence data. The results show that mbX ensures uniform taxonomic formatting, eliminates common manual errors, and quickly generates publication-ready figures, greatly facilitating downstream analysis. For a dataset with 20 samples, both functions of mbX ran in less than 1 s and used less than 1 GB of memory. For a dataset with more than 1170 samples, the functions ran within 125 s and used less than 4.5 GB of memory. By integrating seamlessly with existing pipelines and emphasizing automation, mbX fills a critical gap between sequence classification and statistical analysis. An upcoming version will have an added function which will further extend mbX to automated statistical comparisons, aiming for an end-to-end microbiome analysis solution by integrating mbX with currently available pipelines. This article presents the design of mbX, its workflow and features, and a comparative discussion positioning mbX relative to other microbiome bioinformatics tools. The contributions of mbX highlight its significance in accelerating microbiome research through reproducible and streamlined data analysis. Full article

(This article belongs to the Section Statistical Software)

► Show Figures

Figure 1

21 pages, 3387 KB

Open AccessArticle

Impact of DNA Extraction and 16S rRNA Gene Amplification Strategy on Microbiota Profiling of Faecal Samples

by Francesca Toto, Matteo Scanu, Maurizio Gramegna, Lorenza Putignani and Federica Del Chierico

Int. J. Mol. Sci. 2025, 26(11), 5226; https://doi.org/10.3390/ijms26115226 - 29 May 2025

Cited by 5 | Viewed by 3547

Abstract

High-throughput 16S rRNA metagenomic sequencing has advanced our understanding of the gut microbiome, but its reliability depends on upstream processes such as DNA extraction and bacterial library preparation. In this study, we evaluated the impact of three different DNA extraction methods (a manual [...] Read more.

High-throughput 16S rRNA metagenomic sequencing has advanced our understanding of the gut microbiome, but its reliability depends on upstream processes such as DNA extraction and bacterial library preparation. In this study, we evaluated the impact of three different DNA extraction methods (a manual method with an ad hoc-designed pre-extraction phase (PE-QIA), and two automated magnetic bead-based methods (T180H and TAT132H)) and two bacterial library preparation protocols (home brew and VeriFi) on the 16S rRNA-based metagenomic profiling of faecal samples. T180H and TAT132H produced significantly higher DNA concentrations than PE-QIA, whereas TAT132H yielded DNA of lower purity compared to the others. In the taxonomic analysis, PE-QIA provided a balanced recovery of Gram-positive and Gram-negative bacteria, TAT132H was enriched in Gram-positive taxa, and T180H was enriched in Gram-negative taxa. An analysis of Microbial Community Standard (MOCK) samples showed that PE-QIA and T180H were more accurate than TAT132H. Finally, the VeriFi method yielded higher amplicon concentrations and sequence counts than the home brew protocol, despite the high level of chimeras. In conclusion, a robust performance in terms of DNA yield, purity, and taxonomic representation was obtained by PE-QIA and T180H. Furthermore, it was found that the impact of PCR-based steps on gut microbiota profiling can be minimized by an accurate bioinformatic pipeline. Full article

(This article belongs to the Special Issue Molecular Progression of Gut Microbiota)

► Show Figures

Figure 1

13 pages, 1791 KB

Open AccessArticle

An Automated Bioinformatic Pipeline to Analyze Biodiversity Data for Conservation Purposes: A Test Case for Colorado Macrofungi

by Scott T. Bates, James Chelin, Clark Hollenberg, Amy Honan, Andrew W. Wilson and David Anderson

Conservation 2025, 5(2), 24; https://doi.org/10.3390/conservation5020024 - 26 May 2025

Viewed by 2457

Abstract

Fungi are of critical importance in supporting biodiversity and the world’s ecosystems, yet their conservation status has only been assessed relatively recently as part of the IUCN’s Red List of threatened species. While there are several challenges to evaluating fungi for conservation purposes, [...] Read more.

Fungi are of critical importance in supporting biodiversity and the world’s ecosystems, yet their conservation status has only been assessed relatively recently as part of the IUCN’s Red List of threatened species. While there are several challenges to evaluating fungi for conservation purposes, there is an urgent need to bring fungi more broadly into the conservation framework. Here, we present an automated bioinformatic pipeline for processing data from one of the largest fungal biodiversity datasets to assess species conservation status using a test case of conspicuous macrofungi from the state of Colorado. This pipeline can rapidly process existing data from both specimen- and observation-based records available through MyCoPortal for making conservation status assessments, and the approach presented employs ‘fuzzy matching’ techniques for correcting commonly encountered misspelled taxonomic names in the data. Such assessments are required for integrating fungi into the NatureServe conservation status framework. The pipeline can easily be scaled to produce robust assessments, even at the national level, which can be valuable in focusing field activity for verification purposes. Of the available 117,006 biodiversity data records from Colorado, our processing test case produced a final processed dataset of 36,637 macrofungal records from the state. From this, a focus list of 1613 rarely documented Colorado species was produced for consideration, with 30 of these also being found on the Red List. A more comprehensive conservation status assessment based on scoring in the NatureServe framework was then produced that provided status ranking for 2438 unique, valid, and current taxonomic names for Colorado macrofungi in the processed dataset. Full article

► Show Figures

Figure 1

22 pages, 1053 KB

Open AccessArticle

Wastewater Metavirome Diversity: Exploring Replicate Inconsistencies and Bioinformatic Tool Disparities

by André F. B. Santos, Mónica Nunes, Andreia Filipa-Silva, Victor Pimentel, Marta Pingarilho, Patrícia Abrantes, Mafalda N. S. Miranda, Maria Teresa Barreto Crespo, Ana B. Abecasis, Ricardo Parreira and Sofia G. Seabra

Int. J. Environ. Res. Public Health 2025, 22(5), 707; https://doi.org/10.3390/ijerph22050707 - 30 Apr 2025

Cited by 2 | Viewed by 2337

Abstract

This study investigates viral composition in wastewater through metagenomic analysis, evaluating the performance of four bioinformatic tools—Genome Detective, CZ.ID, INSaFLU-TELEVIR and Trimmomatic + Kraken2—on samples collected from four sites in each of two wastewater treatment plants (WWTPs) in Lisbon, Portugal in April 2019. [...] Read more.

This study investigates viral composition in wastewater through metagenomic analysis, evaluating the performance of four bioinformatic tools—Genome Detective, CZ.ID, INSaFLU-TELEVIR and Trimmomatic + Kraken2—on samples collected from four sites in each of two wastewater treatment plants (WWTPs) in Lisbon, Portugal in April 2019. From each site, we collected and processed separately three replicates and one pool of nucleic acids extracted from the replicates. A total of 32 samples were processed using sequence-independent single-primer amplification (SISPA) and sequenced on an Illumina MiSeq platform. Across the 128 sample–tool combinations, viral read counts varied widely, from 3 to 288,464. There was a lack of consistency between replicates and their pools in terms of viral abundance and diversity, revealing the heterogeneity of the wastewater matrix and the variability in sequencing effort. There was also a difference between software tools highlighting the impact of tool selection on community profiling. A positive correlation between crAssphage and human pathogens was found, supporting crAssphage as a proxy for public health surveillance. A custom Python pipeline automated viral identification report processing, taxonomic assignments and diversity calculations, streamlining analysis and ensuring reproducibility. These findings emphasize the importance of sequencing depth, software tool selection and standardized pipelines in advancing wastewater-based epidemiology. Full article

(This article belongs to the Section Environmental Sciences)

► Show Figures

Figure 1

Search Results (80)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (80)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI