Cancer Biology: Machine Learning and Bioinformatics

A special issue of Biomolecules (ISSN 2218-273X). This special issue belongs to the section "Bioinformatics and Systems Biology".

Deadline for manuscript submissions: closed (15 May 2026) | Viewed by 3465

Special Issue Editors


E-Mail Website
Guest Editor
Department of Biotechnology, Chemistry, and Pharmacy, University of Siena, Via Aldo Moro, 2, 53100 Siena, Italy
Interests: biochemistry; circular bioeconomy; rare diseases; system biology
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Via Aldo Moro, 2, 53100 Siena, Italy
Interests: bioinformatics; structural biology; big data analysis
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Via Aldo Moro, 2, 53100 Siena, Italy
Interests: biochemistry; circular bioeconomy; rare diseases; artificial intelligence; bioinformatics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Recent advances in machine learning and bioinformatics  are significantly impacting the field of cancer biology, providing powerful tools to analyse complex molecular data and identify novel therapeutic targets. These approaches offer valuable insights into multi-omics layers, supporting the development of more accurate diagnostic models and personalised treatment strategies.

This Special Issue will highlight innovative research interconnecting computational science and oncology. We invite contributions that focus on algorithm development, multi-omics investigations, predictive modelling, and machine learning applications, all of which will enhance our understanding of cancer mechanisms and inform clinical decision-making.

Furthermore, particular emphasis will be placed on works that utilise interdisciplinary collaboration and clear translational potential, as collaborative efforts between computational scientists and clinicians can bridge the gap between algorithmic innovations and practical implementations, ensuring that these technological advancements translate into real benefits for cancer patients.

We welcome original research, comprehensive reviews, and forward-looking perspectives from investigators across different disciplines.

Prof. Dr. Annalisa Santucci
Dr. Ottavia Spiga
Dr. Anna Visibelli
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Biomolecules is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • cancer biology
  • bioinformatics
  • machine learning
  • big data

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 2071 KB  
Article
A Global Assessment of the Transcription-Dependent Single Nucleotide Variants Relies on the Characteristics of RNA-Sequencing Technologies
by Xia Zhang, Jiawei Liu, Yabing Zhu, Guixue Hou, Mingzhou Bai, Yuxin Li, Wenbo Cui and Siqi Liu
Biomolecules 2026, 16(2), 211; https://doi.org/10.3390/biom16020211 - 29 Jan 2026
Cited by 1 | Viewed by 696
Abstract
Single nucleotide variants (SNVs) are crucial in cancer occurrence and development. SNVs at the transcriptomic level generally come from genomic variants (g-tSNVs) and RNA editing (e-tSNVs). The types and quantities of e-tSNVs remain a subject of debate due to a relatively poor understanding [...] Read more.
Single nucleotide variants (SNVs) are crucial in cancer occurrence and development. SNVs at the transcriptomic level generally come from genomic variants (g-tSNVs) and RNA editing (e-tSNVs). The types and quantities of e-tSNVs remain a subject of debate due to a relatively poor understanding of RNA editing processes. Herein, we developed TSCS (Transcript SNVs Classifier relying on complementary sequencings), a machine learning classifier that integrates short-read (MGI) and long-read (PacBio) RNA-seq data to accurately distinguish true transcript SNVs using stringent criteria. Applied to five colorectal cancer cell lines (HCT15, LoVo, SW480, SW620, and HCT116), TSCS demonstrated superior accuracy and sensitivity, outperforming established tools (GATK, BCFtools, Longshot, RED_ML). It increased the total detected transcript SNVs by 31.83% on average, with g-tSNVs and e-tSNVs exceeding conventional methods by >1-fold and >2-fold, respectively. TSCS achieved mean recall rates of 75.3% for g-tSNVs and 77.2% for e-tSNVs. Notably, for the first time, e-tSNVs were found in a relatively large proportion of total transcript SNVs in cancer cell lines, approximately 40%. Of the identified e-tSNVs, 80% were attributed to the known RNA editing, but the other e-tSNVs did not fall into any known category. Importantly, the e-tSNVs uniquely detected in this study showed distinct patterns in SNV types and genomic locations. Additionally, the transcript SNVs called by TSCS were partially confirmed using experimental approaches, such as Sanger sequencing, RNC-seq, and mass spectrometry. This study lays the foundation for surveying and appraising the cancer-related e-tSNVs. Full article
(This article belongs to the Special Issue Cancer Biology: Machine Learning and Bioinformatics)
Show Figures

Graphical abstract

27 pages, 1881 KB  
Article
From Latent Manifolds to Targeted Molecular Probes: An Interpretable, Kinome-Scale Generative Machine Learning Framework for Family-Based Kinase Ligand Design
by Gennady Verkhivker, Ryan Kassab and Keerthi Krishnan
Biomolecules 2026, 16(2), 209; https://doi.org/10.3390/biom16020209 - 29 Jan 2026
Viewed by 1002
Abstract
Scaffold-aware artificial intelligence (AI) models enable systematic exploration of chemical space conditioned on protein-interacting ligands, yet the representational principles governing their behavior remain poorly understood. The computational representation of structurally complex kinase small molecules remains a formidable challenge due to the high conservation [...] Read more.
Scaffold-aware artificial intelligence (AI) models enable systematic exploration of chemical space conditioned on protein-interacting ligands, yet the representational principles governing their behavior remain poorly understood. The computational representation of structurally complex kinase small molecules remains a formidable challenge due to the high conservation of ATP active site architecture across the kinome and the topological complexity of structural scaffolds in current generative AI frameworks. In this study, we present a diagnostic, modular and chemistry-first generative framework for design of targeted SRC kinase ligands by integrating ChemVAE-based latent space modeling, a chemically interpretable structural similarity metric (Kinase Likelihood Score), Bayesian optimization, and cluster-guided local neighborhood sampling. Using a comprehensive dataset of protein kinase ligands, we examine scaffold topology, latent-space geometry, and model-driven generative trajectories. We show that chemically distinct scaffolds can converge toward overlapping latent representations, revealing intrinsic degeneracy in scaffold encoding, while specific topological motifs function as organizing anchors that constrain generative diversification. The results demonstrate that kinase scaffolds spanning 37 protein kinase families spontaneously organize into a coherent, low-dimensional manifold in latent space, with SRC-like scaffolds acting as a structural “hub” that enables rational scaffold transformation. Our local sampling approach successfully converts scaffolds from other kinase families (notably LCK) into novel SRC-like chemotypes, with LCK-derived molecules accounting for ~40% of high-similarity outputs. However, both generative strategies reveal a critical limitation: SMILES-based representations systematically fail to recover multi-ring aromatic systems—a topological hallmark of kinase chemotypes—despite ring count being a top feature in our structural similarity metric. This “representation gap” demonstrates that no amount of scoring refinement can compensate for a generative engine that cannot access topologically constrained regions. By diagnosing these constraints within a transparent pipeline and reframing scaffold-aware ligand design as a problem of molecular representation our work provides a conceptual framework for interpreting generative model behavior and for guiding the incorporation of structural priors into future molecular AI architectures. Full article
(This article belongs to the Special Issue Cancer Biology: Machine Learning and Bioinformatics)
Show Figures

Graphical abstract

25 pages, 5047 KB  
Article
Integrative Single-Cell and Machine Learning Analysis Develops a Glutamine Metabolism–Based Prognostic Model and Identifies MSMO1 as a Therapeutic Target in Osteosarcoma
by Hui Ma, Haiyang Zhang, Johny Bajgai, Md. Habibur Rahman, Thu Thao Pham, Chaodeng Mo, Buchan Cao, Yeong-eun Choi, Cheol-Su Kim and Kyu-Jae Lee
Biomolecules 2025, 15(12), 1664; https://doi.org/10.3390/biom15121664 - 28 Nov 2025
Cited by 1 | Viewed by 1084
Abstract
Although metabolic pathways profoundly influence disease behavior, osteosarcoma (OS) still lacks a glutamine metabolism–based framework for patient stratification. By integrating single-cell RNA sequencing with bulk cohorts, we delineated a glutamine-associated transcriptional program and translated it into an externally validated, clinically oriented risk model. [...] Read more.
Although metabolic pathways profoundly influence disease behavior, osteosarcoma (OS) still lacks a glutamine metabolism–based framework for patient stratification. By integrating single-cell RNA sequencing with bulk cohorts, we delineated a glutamine-associated transcriptional program and translated it into an externally validated, clinically oriented risk model. After rigorous quality control and doublet removal, 19 clusters were annotated into 10 cell types. Glutamine metabolism–related gene (GRG) scores, quantified by five orthogonal algorithms (AUCell, UCell, singscore, ssGSEA, and AddModuleScore), revealed pronounced intratumoral heterogeneity, particularly within osteoblastic cells. A composite GRG score correlated with 641 genes, defining 188 differentially expressed genes; intersecting positively correlated and up-regulated genes yielded 91 candidates. Through a 10-fold cross-validated benchmark of 10 machine-learning algorithms and 101 combinations, Step-Cox [forward] + Ridge emerged as the optimal pipeline, producing a five-gene prognostic model (GPX7, COL11A2, CPE, MSMO1, SGMS2) with moderate yet reproducible performance in independent cohorts. Functionally, stable MSMO1 knockdown in U2OS cells suppressed proliferation, migration, and invasion; increased apoptosis; altered GS, GLS, and α-ketoglutarate; and dampened Wnt/β-catenin signaling. Clinically, the model stratifies OS patients into molecular risk subgroups with distinct outcomes, supporting identification of high-risk individuals and informing personalized glutamine-targeted or combination therapies. Mechanistically, glutamine metabolism shapes the OS tumor microenvironment by modulating immune-evasion and angiogenic cues, underscoring its dual role in metabolic adaptation and immune–metabolic crosstalk. Collectively, this study establishes a single-cell–anchored, glutamine-coupled state in OS, introduces an externally validated prognostic tool with translational promise but modest discriminative power, and positions MSMO1 as a metabolic–signaling node warranting further mechanistic and in-vivo investigation. Full article
(This article belongs to the Special Issue Cancer Biology: Machine Learning and Bioinformatics)
Show Figures

Figure 1

Back to TopTop