MDPI - Publisher of Open Access Journals

25 pages, 3712 KB

Open AccessArticle

An AI-Enabled Single-Cell Transcriptomic Analysis Pipeline for Gene Signature Discovery in Natural Killer Cells Linked to Remission Outcomes in Chronic Myeloid Leukemia

by Santoshi Borra, Da Yan, Robert S. Welner and Zongliang Yue

Biology 2026, 15(7), 588; https://doi.org/10.3390/biology15070588 - 6 Apr 2026

Viewed by 647

Abstract

Background: A major technical challenge in single-cell transcriptomics is the absence of an integrative analytic pipeline that can simultaneously leverage gene regulatory network (GRN) architecture, AI-assisted gene panel discovery, and functional relevance analyses to generate coherent biological insights. Existing approaches often treat these [...] Read more.

Background: A major technical challenge in single-cell transcriptomics is the absence of an integrative analytic pipeline that can simultaneously leverage gene regulatory network (GRN) architecture, AI-assisted gene panel discovery, and functional relevance analyses to generate coherent biological insights. Existing approaches often treat these components independently, focusing on clusters, marker genes, or predictive features without integrating them into a mechanistically grounded framework. Consequently, comprehensive screening that links regulatory association, gene signature screening, and functional interpretation within single-cell datasets remains limited, underscoring the need for an integrated strategy. Methods: We developed an integrative bioinformatics pipeline based on Gene regulatory network–AI–Functional Analysis (GAFA), combining latent-space integration, unsupervised clustering, diffusion pseudotime analysis, lineage-resolved generalized additive modeling, GRN inference, and machine learning-based gene panel discovery. This framework enables systematic mapping of cell-state structure, reconstruction of differentiation and effector trajectories, and identification of transcriptional and regulatory features strongly associated with clinical outcomes. As a case study, we applied the pipeline to NK cell transcriptomes from six CML patients (two early relapse, two late relapse, two durable treatment-free remission—TFR; 15 samples) collected at TKI discontinuation and 6–12 months after therapy cessation. Results: We reanalyzed publicly available scRNA-seq data from a previously published CML cohort to evaluate NK-cell transcriptional programs associated with treatment-free remission and relapse. We resolved six transcriptionally distinct NK cell states spanning CD56^bright-like cytokine-responsive, early activated, terminally mature, cytotoxic, lymphoid trafficking, and HLA-DR⁺ immunoregulatory populations, each exhibiting outcome-specific compositional differences. Pseudotime analysis revealed two major NK cell lineages—a maturation trajectory and a cytotoxic effector trajectory. TFR samples displayed balanced occupancy of both lineages, whereas early relapse samples showed marked depletion of the maturation branch and preferential accumulation in cytotoxic end states. AI-guided feature selection and random forest modeling identified an 18-gene panel that distinguished NK cells from TFR and relapse samples in an exploratory manner. Among them, CST7, FCER1G, GNLY, GZMA, and HLA-C were conventional NK-associated genes, whereas ACTB, CYBA, IFITM2, IFITM3, LYZ, MALAT1, MT2A, MYOM2, NFKBIA, PIM1, S100A8, S100B, and TSC22D3 were novel. The GRN inference further uncovered outcome-specific regulatory modules, with RUNX3, EOMES, ELK4, and REL regulons enriched in TFR, whereas FOSL2 and MAF regulons were enriched in relapse, and their downstream targets linked to IFN-γ signaling, metabolic reprogramming, and immunoregulatory feedback circuits. Conclusions: This AI-enabled single-cell analysis demonstrates how NK cell state composition, differentiation trajectories, and regulatory network rewiring collectively shape TFR versus relapse following TKI discontinuation in CML. The integrative pipeline provides a modular framework that could be extended to additional datasets for data-driven biomarker discovery and mechanistic stratification, and highlights candidate transcriptional regulators and NK cell programs that may be leveraged to improve remission durability, pending validation in larger patient cohorts. Full article

(This article belongs to the Special Issue Advancing Translational Science Using Bioinformatics and Big Data-Driven Approaches)

► Show Figures

Figure 1

17 pages, 280 KB

Open AccessReview

Software Applications in Biomedicine: A Narrative Review of Translational Pathways from Data to Decision

by Gabriela Georgieva Panayotova

BioMedInformatics 2026, 6(1), 9; https://doi.org/10.3390/biomedinformatics6010009 - 4 Feb 2026

Viewed by 1078

Abstract

Background/Objectives: Software is now core infrastructure in biomedical science, yet fragmented workflows across subfields hinder reproducibility and delay the translation of data into actionable decisions. There is a critical need for a cross-disciplinary synthesis to bridge these silos and establish a unified framework [...] Read more.

Background/Objectives: Software is now core infrastructure in biomedical science, yet fragmented workflows across subfields hinder reproducibility and delay the translation of data into actionable decisions. There is a critical need for a cross-disciplinary synthesis to bridge these silos and establish a unified framework for software maturity. This narrative review addresses this gap by synthesizing representative software ecosystems across three major pillars: bioinformatics, molecular modeling/simulations, and epidemiology/public health. Methods: A narrative review of articles indexed in PubMed/NCBI, Web of Science, and Scopus between 2000 and 2025 was conducted. Domain-specific terms related to bioinformatics, molecular modeling, docking, molecular dynamics, epidemiology, public health, and workflow management were combined with software- and algorithm-focused keywords. Studies describing, validating, or applying documented tools with biomedical relevance were included. Results: Across domains, mature data standards and reference resources (e.g., FASTQ, BAM/CRAM, VCF, mzML), widely adopted platforms (e.g., BLAST+ (v2.16.0, NCBI, Bethesda, MD, USA), Bioconductor (v3.20, Bioconductor Foundation, Seattle, WA, USA), AutoDock Vina (v1.2.5, Scripps Research, La Jolla, CA, USA), GROMACS (v2024.3, GROMACS Team, Stockholm, Sweden), Epi Info (v7.2.6, CDC, Atlanta, GA, USA), QGIS (v3.40, QGIS.org, Gossau, Switzerland), and increasing use of workflow engines were identified. Software pipelines routinely transform molecular and surveillance data into interpretable features supporting hypothesis generation. Conclusions: Integrated, standards-based, and validated software pipelines can shorten the path from measurement to decision in biomedicine and public health. Future progress depends on reproducibility practices, benchmarking, user-centered design, portable implementations, and responsible deployment of machine learning. Full article

(This article belongs to the Section Computational Biology and Medicine)

19 pages, 2367 KB

Open AccessArticle

Effect of Non-Antibiotic Pollution in Farmland Soil on the Risk of Antibiotic Resistance Gene Transfer

by Jin Huang, Xiajiao Wang, Zhengyang Deng, Zhixing Ren and Yu Li

Sustainability 2026, 18(1), 447; https://doi.org/10.3390/su18010447 - 2 Jan 2026

Viewed by 469

Abstract

The widespread use of antibiotics, combined with pervasive exposure to diverse environmental media, has intensified the global challenge of antibiotic resistance. Accumulating evidence reveals that beyond direct antibiotic pressure, residual non-antibiotic chemicals—despite lacking intrinsic antibacterial activity—can significantly promote the enrichment and spread of [...] Read more.

The widespread use of antibiotics, combined with pervasive exposure to diverse environmental media, has intensified the global challenge of antibiotic resistance. Accumulating evidence reveals that beyond direct antibiotic pressure, residual non-antibiotic chemicals—despite lacking intrinsic antibacterial activity—can significantly promote the enrichment and spread of antibiotic resistance genes (ARGs) in farmland soils through indirect mechanisms such as inducing oxidative stress, altering microbial community structure, and enhancing both vertical and horizontal gene transfer. To address this issue, the present study investigates the influence of representative non-antibiotic contaminants commonly detected in agricultural environments—including pesticides (e.g., Omethoate, imidacloprid, and atrazine), industrial pollutants (e.g., PCB138, BDE47, benzo [a] pyrene, 2,3,7,8-tetrachlorodibenzo-p-dioxin [TCDD], and benzene), plastic-associated compounds (e.g., Polyethylene trimer, phthalates, and tributyl acetylcitrate), and ingredients from personal care products (e.g., triclosan and bisphenol A)—on ARG transmission dynamics. Leveraging bioinformatics resources such as the CARD database, PDB, AlphaFold, and molecular sequence analysis tools, we identified relevant small-molecule ligands and macromolecular receptors to construct a simulation system modeling ARG transfer pathways. Molecular docking and molecular dynamics (MD) simulations were then implemented, guided by a Plackett–Burman experimental design, to systematically evaluate the impact of individual and co-occurring pollutants. The resulting data were processed using advanced analytical tools, and MD trajectories were interpreted at the molecular level across three scenarios: an unperturbed (blank) system, single-pollutant exposures, and dual-pollutant combinations. By integrating computational simulations with machine learning approaches, this work uncovers the “co-selection” effect exerted by non-antibiotic chemical residues in shaping the environmental resistome, thereby providing a mechanistic and scientific basis for comprehensive risk assessment of agricultural non-point source pollution and the development of effective soil health management and antimicrobial resistance containment strategies. Full article

(This article belongs to the Special Issue Sustainable Strategies for the Control of Persistent Toxic and Radioactive Contaminants)

► Show Figures

Figure 1

8 pages, 218 KB

Open AccessOpinion

The Era of Precision Psychiatry: Toward a New Paradigm in Diagnosis and Care

by Antonio Del Casale, Liliana Bronzatti, Jan Francesco Arena, Giovanna Gentile, Carlo Lai, Paolo Girardi, Maurizio Simmaco and Marina Borro

Psychiatry Int. 2025, 6(4), 146; https://doi.org/10.3390/psychiatryint6040146 - 1 Dec 2025

Cited by 1 | Viewed by 2373

Abstract

Mental disorders affect nearly one billion persons worldwide, having a substantial burden on individuals, families, and healthcare systems. Current diagnostic and therapeutic approaches could fail to reach optimal outcomes, highlighting the need for more effective and personalized interventions. Precision psychiatry aims to address [...] Read more.

Mental disorders affect nearly one billion persons worldwide, having a substantial burden on individuals, families, and healthcare systems. Current diagnostic and therapeutic approaches could fail to reach optimal outcomes, highlighting the need for more effective and personalized interventions. Precision psychiatry aims to address this challenge by integrating multidimensional data, ranging from genomics and epigenomics to neuroimaging and psychometric assessments, through advanced computational tools such as machine learning and artificial intelligence. This transdisciplinary approach could allow the study of biologically informed endophenotypes, improve diagnostic accuracy, and support individualized treatment strategies. Emerging technologies, including pharmaco-neuroimaging, virtual histology, and large-scale consortia, are advancing the field by elucidating the molecular and circuit-level correlates of mental disorders. Although significant progress has been made, the translational gap between research and clinical practice remains a critical issue. Effective implementation will require the systematic integration of bioinformatic tools, big data analytics, and clinician-guided interpretation, in a context in which the evolving landscape of precision psychiatry continues to prioritize therapeutic alliance and individualized patient care. Full article

(This article belongs to the Special Issue Precision Psychiatry and Advances in Patient Care: Innovations Transforming the Diagnosis and Treatment of Mental Disorders)

22 pages, 1178 KB

Open AccessArticle

Identification of Potential Biomarkers in Prostate Cancer Microarray Gene Expression Leveraging Explainable Machine Learning Classifiers

by Ahmed Al Marouf, Jon George Rokne and Reda Alhajj

Cancers 2025, 17(23), 3853; https://doi.org/10.3390/cancers17233853 - 30 Nov 2025

Cited by 1 | Viewed by 844

Abstract

Background and Objective: Prostate cancer remains one of the most prevalent and potentially lethal malignancies among men worldwide, and timely and accurate diagnosis, along with the stratification of patients by disease severity, is critical for personalized treatment and improved outcomes for this cancer. [...] Read more.

Background and Objective: Prostate cancer remains one of the most prevalent and potentially lethal malignancies among men worldwide, and timely and accurate diagnosis, along with the stratification of patients by disease severity, is critical for personalized treatment and improved outcomes for this cancer. One of the tools used for diagnosis is bioinformatics. However, traditional biomarker discovery methods often lack transparency and interpretability, which means that clinicians find it difficult to trust biomarkers for their application in a clinical setting. Methods: This paper introduces a novel approach that leverages Explainable Machine Learning (XML) techniques to identify and prioritize biomarkers associated with different levels of severity of prostate cancer. The proposed XML approach presented in this study incorporates some traditional machine learning (ML) algorithms with transparent models to facilitate understanding of the importance of the characteristics for bioinformatics analysis, allowing for more informed clinical decisions. The proposed method contains the implementation of several ML classifiers, such as Naive Bayes (NB), Random Forest (RF), Decision Tree (DT), Support Vector Machine (SVM), Logistic Regression (LR), and Bagging (Bg); followed by SHAPly values for the XML pipeline. In this study, for pre-processing of missing values, imputation was applied; SMOTE (Synthetic Minority Oversampling Technique) and the Tomek link method were applied to handle the class imbalance problem. The k-fold stratified validation of machine learning (ML) models and SHAP values (SHapley Additive explanations) were used for explainability. Results: This study utilized a novel tissue microarray data set that has 102 patient data comprising prostate cancer and healthy patients. The proposed model satisfactorily identifies genes as biomarkers, with highest accuracy obtained being 81.01% using RF. The top 10 potential biomarkers identified in this study are DEGS1, HPN, ERG, CFD, TMPRSS2, PDLIM5, XBP1, AJAP1, NPM1 and C7. Conclusions: As XML continues to unravel the complexities within prostate cancer datasets, the identification of severity-specific biomarkers is poised at the forefront of precision oncology. This integration paves the way for targeted interventions, improving patient outcomes, and heralding a new era of individualized care in the fight against prostate cancer. Full article

(This article belongs to the Special Issue Innovative Biomarkers and Imaging Techniques for the Early Detection and Risk Stratification of Urological Tumours)

► Show Figures

Figure 1

16 pages, 2924 KB

Open AccessReview

Applications of Genome Sequencing in Infectious Diseases: From Pathogen Identification to Precision Medicine

by Gulam Mustafa Hasan, Taj Mohammad, Anas Shamsi, Sukhwinder Singh Sohal and Md. Imtaiyaz Hassan

Pharmaceuticals 2025, 18(11), 1687; https://doi.org/10.3390/ph18111687 - 7 Nov 2025

Cited by 4 | Viewed by 2981

Abstract

Background: Genome sequencing is transforming infectious-disease diagnostics, surveillance, and precision therapy by enabling rapid, high-resolution pathogen identification, transmission tracking, and genomic-informed antimicrobial stewardship. Methods: We review contemporary sequencing platforms (short- and long-read), targeted and metagenomic approaches, and operational workflows that connect laboratory outputs [...] Read more.

Background: Genome sequencing is transforming infectious-disease diagnostics, surveillance, and precision therapy by enabling rapid, high-resolution pathogen identification, transmission tracking, and genomic-informed antimicrobial stewardship. Methods: We review contemporary sequencing platforms (short- and long-read), targeted and metagenomic approaches, and operational workflows that connect laboratory outputs to clinical and public health decision-making. We highlight strengths and limitations of genomic AMR prediction, the role of plasmids and mobile elements in resistance and virulence, and practical steps for clinical translation, including validation, reporting standards, and integration with electronic health records. Results: Comparative and population genomics reveal virulence determinants and host–pathogen interactions that correlate with clinical outcomes, improving risk stratification for high-risk infections. Integrating sequencing with epidemiological and clinical metadata enhances surveillance, uncovers cryptic transmission pathways, and supports infection control policies. Despite these advances, clinical implementation faces technical and interpretative barriers, as well as challenges related to turnaround time, data quality, bioinformatic complexity, cost, and ethical considerations. These issues must be addressed to realize routine, point-of-care sequencing. Conclusions: Emerging solutions, including portable sequencing devices, standardized pipelines, and machine-learning models, promise faster, more actionable results and tighter integration with electronic health records. The widespread adoption of sequencing in clinical workflows has the potential to shift infectious disease management toward precision medicine, thereby improving diagnostics, treatment selection, and public health responses. Full article

(This article belongs to the Special Issue Pharmacogenomics for Precision Medicine)

► Show Figures

Graphical abstract

26 pages, 916 KB

Open AccessReview

Integrating Artificial Intelligence in Next-Generation Sequencing: Advances, Challenges, and Future Directions

by Konstantina Athanasopoulou, Vasiliki-Ioanna Michalopoulou, Andreas Scorilas and Panagiotis G. Adamopoulos

Curr. Issues Mol. Biol. 2025, 47(6), 470; https://doi.org/10.3390/cimb47060470 - 19 Jun 2025

Cited by 31 | Viewed by 6996

Abstract

The integration of artificial intelligence (AI) into next-generation sequencing (NGS) has revolutionized genomics, offering unprecedented advancements in data analysis, accuracy, and scalability. This review explores the synergistic relationship between AI and NGS, highlighting its transformative impact across genomic research and clinical applications. AI-driven [...] Read more.

The integration of artificial intelligence (AI) into next-generation sequencing (NGS) has revolutionized genomics, offering unprecedented advancements in data analysis, accuracy, and scalability. This review explores the synergistic relationship between AI and NGS, highlighting its transformative impact across genomic research and clinical applications. AI-driven tools, including machine learning and deep learning, enhance every aspect of NGS workflows—from experimental design and wet-lab automation to bioinformatics analysis of the generated raw data. Key applications of AI integration in NGS include variant calling, epigenomic profiling, transcriptomics, and single-cell sequencing, where AI models such as CNNs, RNNs, and hybrid architectures outperform traditional methods. In cancer research, AI enables precise tumor subtyping, biomarker discovery, and personalized therapy prediction, while in drug discovery, it accelerates target identification and repurposing. Despite these advancements, challenges persist, including data heterogeneity, model interpretability, and ethical concerns. This review also discusses the emerging role of AI in third-generation sequencing (TGS), addressing long-read-specific challenges, like fast and accurate basecalling, as well as epigenetic modification detection. Future directions should focus on implementing federated learning to address data privacy, advancing interpretable AI to improve clinical trust and developing unified frameworks for seamless integration of multi-modal omics data. By fostering interdisciplinary collaboration, AI promises to unlock new frontiers in precision medicine, making genomic insights more actionable and scalable. Full article

(This article belongs to the Special Issue Technological Advances Around Next-Generation Sequencing Application)

► Show Figures

Graphical abstract

15 pages, 4381 KB

Open AccessArticle

Bioinformatics-Driven Multi-Factorial Insight into α-Galactosidase Mutations

by Bruno Hay Mele, Federica Rossetti, Giuseppina Andreotti, Maria Vittoria Cubellis, Simone Guerriero and Maria Monticelli

Int. J. Mol. Sci. 2025, 26(12), 5802; https://doi.org/10.3390/ijms26125802 - 17 Jun 2025

Viewed by 1413

Abstract

Fabry disease is a rare genetic disorder caused by deficient activity of the lysosomal enzyme alpha-galactosidase A (AGAL), resulting in the accumulation of globotriaosylceramides (Gb3) in tissues and organs. This buildup leads to progressive, multi-systemic complications that severely impact quality of life and [...] Read more.

Fabry disease is a rare genetic disorder caused by deficient activity of the lysosomal enzyme alpha-galactosidase A (AGAL), resulting in the accumulation of globotriaosylceramides (Gb3) in tissues and organs. This buildup leads to progressive, multi-systemic complications that severely impact quality of life and can be life-threatening. Interpreting the functional consequences of missense variants in the GLA gene remains a significant challenge, especially in rare diseases where experimental evidence is scarce. In this study, we present an integrative computational framework that combines structural, interaction, pathogenicity, and stability data from both in silico tools and experimental sources, enriched through expert curation and structural analysis. Given the clinical relevance of pharmacological chaperones in Fabry disease, we focus in particular on the structural characteristics of variants classified as “amenable” to such treatments. Our multidimensional analysis—using tools such as AlphaMissense, EVE, FoldX, and ChimeraX—identifies key molecular features that distinguish amenable from non-amenable variants. We find that amenable mutations tend to preserve protein stability, while non-amenable ones are associated with structural destabilisation. By comparing AlphaMissense with alternative predictors rooted in evolutionary (EVE) and thermodynamic (FoldX) models, we explore the relative contribution of different biological paradigms to variant classification. Additionally, the investigation of outlier variants—where AlphaMissense predictions diverge from clinical annotations—highlights candidates for further experimental validation. These findings demonstrate how combining structural bioinformatics with machine learning–based predictions can improve missense variant interpretation and support precision medicine in rare genetic disorders. Full article

(This article belongs to the Special Issue New Advances in Protein Structure, Function and Design)

► Show Figures

Figure 1

24 pages, 3408 KB

Open AccessArticle

PON-P3: Accurate Prediction of Pathogenicity of Amino Acid Substitutions

by Muhammad Kabir, Saeed Ahmed, Haoyang Zhang, Ignacio Rodríguez-Rodríguez, Seyed Morteza Najibi and Mauno Vihinen

Int. J. Mol. Sci. 2025, 26(5), 2004; https://doi.org/10.3390/ijms26052004 - 25 Feb 2025

Cited by 3 | Viewed by 1711

Abstract

Different types of information are combined during variation interpretation. Computational predictors, most often pathogenicity predictors, provide one type of information for this purpose. These tools are based on various kinds of algorithms. Although the American College of Genetics and the Association for Molecular [...] Read more.

Different types of information are combined during variation interpretation. Computational predictors, most often pathogenicity predictors, provide one type of information for this purpose. These tools are based on various kinds of algorithms. Although the American College of Genetics and the Association for Molecular Pathology guidelines classify variants into five categories, practically all pathogenicity predictors provide binary pathogenic/benign predictions. We developed a novel artificial intelligence-based tool, PON-P3, on the basis of a carefully selected training dataset, meticulous feature selection, and optimization. We started with 1526 features describing variations, their sequence and structural context, and parameters for the affected genes and proteins. The final random boosting method was tested and compared with a total of 23 predictors. PON-P3 performed better than recently introduced predictors, which utilize large language models or structural predictions. PON-P3 was better than methods that use evolutionary data alone or in combination with different gene and protein properties. PON-P3 classifies cases into three categories as benign, pathogenic, and variants of uncertain significance (VUSs). When binary test data were used, some metapredictors performed slightly better than PON-P3; however, in real-life situations, with patient data, those methods overpredict both pathogenic and benign cases. We predicted with PON-P3 all possible amino acid substitutions in all human proteins encoded from MANE transcripts. The method was also used to predict all unambiguous VUSs (i.e., without conflicts) in ClinVar. A total of 12.9% were predicted to be pathogenic, and 49.9% were benign. Full article

(This article belongs to the Section Molecular Genetics and Genomics)

► Show Figures

Figure 1

22 pages, 437 KB

Open AccessReview

Harnessing Unsupervised Ensemble Learning for Biomedical Applications: A Review of Methods and Advances

by Mehmet Eren Ahsen

Mathematics 2025, 13(3), 420; https://doi.org/10.3390/math13030420 - 27 Jan 2025

Cited by 4 | Viewed by 2082

Abstract

Advancements in data availability and computational techniques, including machine learning, have transformed the field of bioinformatics, enabling the robust analysis of complex, high-dimensional, and heterogeneous biomedical data. This paper explores how diverse bioinformatics tasks, including differential expression analysis, network inference, and somatic mutation [...] Read more.

Advancements in data availability and computational techniques, including machine learning, have transformed the field of bioinformatics, enabling the robust analysis of complex, high-dimensional, and heterogeneous biomedical data. This paper explores how diverse bioinformatics tasks, including differential expression analysis, network inference, and somatic mutation calling, can be reframed as binary classification tasks, thereby providing a unifying framework for their analysis. Traditional single-method approaches often fail to generalize across datasets due to differences in data distributions, noise levels, and underlying biological contexts. Ensemble learning, particularly unsupervised ensemble approaches, emerges as a compelling solution by integrating predictions from multiple algorithms to leverage their strengths and mitigate weaknesses. This review focuses on the principles and recent advancements in ensemble learning, with a particular emphasis on unsupervised ensemble methods. These approaches demonstrate their ability to address critical challenges in bioinformatics, such as the lack of labeled data and the integration of predictions from algorithms operating on different scales. Overall, this paper highlights the transformative potential of ensemble learning in advancing predictive accuracy, robustness, and interpretability across diverse bioinformatics applications. Full article

(This article belongs to the Special Issue Mathematical Approaches to Advanced Applications in Biomedicine Using Machine Learning)

► Show Figures

Figure 1

23 pages, 3840 KB

Open AccessArticle

Longitudinal Microbiome Investigations Reveal Core and Growth-Associated Bacteria During Early Life Stages of Scylla paramamosain

by Yin Fu, Yongxu Cheng, Lingbo Ma and Qicun Zhou

Microorganisms 2024, 12(12), 2457; https://doi.org/10.3390/microorganisms12122457 - 29 Nov 2024

Cited by 1 | Viewed by 2251

Abstract

In animals, growth and development are strongly correlated with the gut microbiota. The gut of the economically important marine crab (Scylla paramamosain) harbors a diverse microbial community, yet its associations with the surrounding environment, growth performance, and developmental stages remain obscure. [...] Read more.

In animals, growth and development are strongly correlated with the gut microbiota. The gut of the economically important marine crab (Scylla paramamosain) harbors a diverse microbial community, yet its associations with the surrounding environment, growth performance, and developmental stages remain obscure. In this study, we first characterized stage-specific microbiomes and shifts in the contributions of live feed and water via SourceTracker. We observed decreased microbial diversity and increased priority effects along zoea stages. Psychobacter was identified as the core genus, whereas Lactobacillus was the hub genus connecting different stages. Second, microbial correlations with various stage-specific growth traits were observed under interventions generating enhanced (probiotic mixture enrichment), normal (control), and reduced (antibiotic treatment) microbiomes. By combining machine learning regression and bioinformatics analysis, we identified four candidate growth performance-associated probiotics belonging to Rhodobacterales, Sulfitobacter, Confluentimicrobium, and Lactobacillus, respectively. Our study interpreted the dynamics and origins of the Scylla paramamosain zoea microbiome and underscored the importance of optimizing potential probiotics to increase growth performance during early life stages in marine invertebrates for effective larviculture. Full article

(This article belongs to the Special Issue Aquatic Microorganisms and Their Application in Aquaculture)

► Show Figures

Figure 1

25 pages, 17790 KB

Open AccessReview

Visualization Methods for DNA Sequences: A Review and Prospects

by Tan Li, Mengshan Li, Yan Wu and Yelin Li

Biomolecules 2024, 14(11), 1447; https://doi.org/10.3390/biom14111447 - 14 Nov 2024

Cited by 3 | Viewed by 3894

Abstract

The efficient analysis and interpretation of biological sequence data remain major challenges in bioinformatics. Graphical representation, as an emerging and effective visualization technique, offers a more intuitive method for analyzing DNA sequences. However, many visualization approaches are dispersed across research databases, requiring urgent [...] Read more.

The efficient analysis and interpretation of biological sequence data remain major challenges in bioinformatics. Graphical representation, as an emerging and effective visualization technique, offers a more intuitive method for analyzing DNA sequences. However, many visualization approaches are dispersed across research databases, requiring urgent organization, integration, and analysis. Additionally, no single visualization method excels in all aspects. To advance these methods, knowledge graphs and advanced machine learning techniques have become key areas of exploration. This paper reviews the current 2D and 3D DNA sequence visualization methods and proposes a new research direction focused on constructing knowledge graphs for biological sequence visualization, explaining the relevant theories, techniques, and models involved. Additionally, we summarize machine learning techniques applicable to sequence visualization, such as graph embedding methods and the use of convolutional neural networks (CNNs) for processing graphical representations. These machine learning techniques and knowledge graphs aim to provide valuable insights into computational biology, bioinformatics, genomic computing, and evolutionary analysis. The study serves as an important reference for improving intelligent search systems, enriching knowledge bases, and enhancing query systems related to biological sequence visualization, offering a comprehensive framework for future research. Full article

(This article belongs to the Section Bioinformatics and Systems Biology)

► Show Figures

Figure 1

11 pages, 579 KB

Open AccessReview

Revolutionizing Cancer Research: The Impact of Artificial Intelligence in Digital Biobanking

by Chiara Frascarelli, Giuseppina Bonizzi, Camilla Rosella Musico, Eltjona Mane, Cristina Cassi, Elena Guerini Rocco, Annarosa Farina, Aldo Scarpa, Rita Lawlor, Luca Reggiani Bonetti, Stefania Caramaschi, Albino Eccher, Stefano Marletta and Nicola Fusco

J. Pers. Med. 2023, 13(9), 1390; https://doi.org/10.3390/jpm13091390 - 16 Sep 2023

Cited by 31 | Viewed by 5776

Abstract

Background. Biobanks are vital research infrastructures aiming to collect, process, store, and distribute biological specimens along with associated data in an organized and governed manner. Exploiting diverse datasets produced by the biobanks and the downstream research from various sources and integrating bioinformatics and [...] Read more.

Background. Biobanks are vital research infrastructures aiming to collect, process, store, and distribute biological specimens along with associated data in an organized and governed manner. Exploiting diverse datasets produced by the biobanks and the downstream research from various sources and integrating bioinformatics and “omics” data has proven instrumental in advancing research such as cancer research. Biobanks offer different types of biological samples matched with rich datasets comprising clinicopathologic information. As digital pathology and artificial intelligence (AI) have entered the precision medicine arena, biobanks are progressively transitioning from mere biorepositories to integrated computational databanks. Consequently, the application of AI and machine learning on these biobank datasets holds huge potential to profoundly impact cancer research. Methods. In this paper, we explore how AI and machine learning can respond to the digital evolution of biobanks with flexibility, solutions, and effective services. We look at the different data that ranges from specimen-related data, including digital images, patient health records and downstream genetic/genomic data and resulting “Big Data” and the analytic approaches used for analysis. Results. These cutting-edge technologies can address the challenges faced by translational and clinical research, enhancing their capabilities in data management, analysis, and interpretation. By leveraging AI, biobanks can unlock valuable insights from their vast repositories, enabling the identification of novel biomarkers, prediction of treatment responses, and ultimately facilitating the development of personalized cancer therapies. Conclusions. The integration of biobanking with AI has the potential not only to expand the current understanding of cancer biology but also to pave the way for more precise, patient-centric healthcare strategies. Full article

(This article belongs to the Section Methodology, Drug and Device Discovery)

► Show Figures

Figure 1

18 pages, 7278 KB

Open AccessArticle

Nephrotoxicity Development of a Clinical Decision Support System Based on Tree-Based Machine Learning Methods to Detect Diagnostic Biomarkers from Genomic Data in Methotrexate-Induced Rats

by Ipek Balikci Cicek, Cemil Colak, Saim Yologlu, Zeynep Kucukakcali, Onural Ozhan, Elif Taslidere, Nefsun Danis, Ahmet Koc, Hakan Parlakpinar and Sami Akbulut

Appl. Sci. 2023, 13(15), 8870; https://doi.org/10.3390/app13158870 - 1 Aug 2023

Cited by 2 | Viewed by 2408

Abstract

Background: The purpose of this study was to carry out the bioinformatic analysis of lncRNA data obtained from the genomic analysis of kidney tissue samples taken from rats with nephrotoxicity induced by methotrexate (MTX) and from rats without pathology and modeling with the [...] Read more.

Background: The purpose of this study was to carry out the bioinformatic analysis of lncRNA data obtained from the genomic analysis of kidney tissue samples taken from rats with nephrotoxicity induced by methotrexate (MTX) and from rats without pathology and modeling with the tree-based machine learning method. Another aim of the study was to identify potential biomarkers for the diagnosis of nephrotoxicity and to provide a better understanding of the nephrotoxicity formation process by providing the interpretability of the model with explainable artificial intelligence methods as a result of the modeling. Methods: To identify potential indicators of drug-induced nephrotoxicity, 20 female Wistar albino rats were separated into two groups: MTX-treated and the control. Kidney tissue samples were collected from the rats, and genomic, histological, and immunohistochemical analyses were performed. The dataset obtained as a result of genomic analysis was modeled with random forest (RF), a tree-based method. Modeling results were evaluated with sensitivity (Se), specificity (Sp), balanced accuracy (B-Acc), negative predictive value (Npv), accuracy (Acc), positive predictive value (Ppv), and F1-score performance metrics. The local interpretable model-agnostic annotations (LIME) method was used to determine the lncRNAs that could be biomarkers for nephrotoxicity by providing the interpretability of the RF model. Results: The outcomes of the histological and immunohistochemical analyses conducted in the study support the conclusion that MTX use caused kidney injury. According to the results of the bioinformatics analysis, 52 lncRNAs showed different expressions in the groups. As a result of modeling with RF for lncRNAs selected with Boruta variable selection, the B-Acc, Acc, Sp, Se, Npv, Ppv, and F1-score were 88.9%, 90%, 90.9%, 88.9%, 90.9%, 88.9%, and 88.9%, respectively. lncRNAs with id rnaXR_591534.3 rnaXR_005503408.1, rnaXR_005495645.1, rnaXR_001839007.2, rnaXR_005492056.1, and rna_XR_005492522.1. The lncRNAs with the highest variable importance values produced from RF modeling can be used as nephrotoxicity biomarker candidates. Furthermore, according to the LIME results, the high level of lncRNAs with id rnaXR_591534.3 and rnaXR_005503408.1 particularly increased the possibility of nephrotoxicity. Conclusions: With the possible biomarkers resulting from the analyses in this study, it can be ensured that the procedures for the diagnosis of drug-induced nephrotoxicity can be carried out easily, quickly, and effectively. Full article

(This article belongs to the Section Applied Biosciences and Bioengineering)

► Show Figures

Figure 1

20 pages, 6974 KB

Open AccessArticle

PromGER: Promoter Prediction Based on Graph Embedding and Ensemble Learning for Eukaryotic Sequence

by Yan Wang, Shiwen Tai, Shuangquan Zhang, Nan Sheng and Xuping Xie

Genes 2023, 14(7), 1441; https://doi.org/10.3390/genes14071441 - 13 Jul 2023

Cited by 19 | Viewed by 5289

Abstract

Promoters are DNA non-coding regions around the transcription start site and are responsible for regulating the gene transcription process. Due to their key role in gene function and transcriptional activity, the prediction of promoter sequences and their core elements accurately is a crucial [...] Read more.

Promoters are DNA non-coding regions around the transcription start site and are responsible for regulating the gene transcription process. Due to their key role in gene function and transcriptional activity, the prediction of promoter sequences and their core elements accurately is a crucial research area in bioinformatics. At present, models based on machine learning and deep learning have been developed for promoter prediction. However, these models cannot mine the deeper biological information of promoter sequences and consider the complex relationship among promoter sequences. In this work, we propose a novel prediction model called PromGER to predict eukaryotic promoter sequences. For a promoter sequence, firstly, PromGER utilizes four types of feature-encoding methods to extract local information within promoter sequences. Secondly, according to the potential relationships among promoter sequences, the whole promoter sequences are constructed as a graph. Furthermore, three different scales of graph-embedding methods are applied for obtaining the global feature information more comprehensively in the graph. Finally, combining local features with global features of sequences, PromGER analyzes and predicts promoter sequences through a tree-based ensemble-learning framework. Compared with seven existing methods, PromGER improved the average specificity of 13%, accuracy of 10%, Matthew’s correlation coefficient of 16%, precision of 4%, F1 score of 6%, and AUC of 9%. Specifically, this study interpreted the PromGER by the t-distributed stochastic neighbor embedding (t-SNE) method and SHAPley Additive exPlanations (SHAP) value analysis, which demonstrates the interpretability of the model. Full article

(This article belongs to the Section Bioinformatics)

► Show Figures

Figure 1

Search Results (25)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (25)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI