Integrated Multi-Omics and Machine Learning Framework Identifies Diagnostic Signatures and Druggable Targets in Breast Cancer
Abstract
1. Introduction
2. Methods
2.1. Study Design
2.2. Data Collection and Analysis
2.3. Differentially Expressed Genes (DEGs) Between the Training Set and TCGA Dataset
2.4. Weighted Gene Co-Expression Network Analysis (WGCNA) of DEGs
2.5. PPI Analysis of DEGs
Feature Selection and Hub Gene Identification Based on Integrated ML and SHAP Algorithm
2.6. Functional Annotation and Gene Interaction Network Analysis
2.7. Immune Infiltration Assessment and microRNA Regulation Analysis
2.8. SMR-Based Causal Inference Analysis of Gene Expression–BC Risk
2.9. Single-Cell Spatial Transcriptome Data Analysis and Hub Gene Expression Validation
2.10. Drug Repurposing Analysis and Compound Prioritization
2.11. Molecular Docking and Protein–Ligand Interaction Analysis
3. Results
3.1. DEGs Analysis of BC Based on GEO and TCGA Data
3.2. WGCNA and Screening of Potential BC-Related Genes
3.3. PPI Analysis
3.4. Constructing a Diagnostic Model of BC by ML
3.5. External Validation of SHAP-Identified Hub Genes
3.6. Functional Characterization and Interaction Network Analysis of CHEK1 and KIF23
3.7. Immune Infiltration Analysis and miRNA Regulatory Network Construction of CHEK1 and KIF23
3.8. SMR-Based Mendelian Randomization Analysis of Hub Genes and BC Risk
3.9. Single-Cell RNA Sequencing Validation of Hub Gene CHEK1 Expression
3.10. Drug Enrichment and Virtual Screening Analysis of CHEK1
3.11. Molecular Docking Analysis of Candidate Compounds with CHEK1
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| Abbreviation | Full name |
| BC | Breast Cancer |
| GEO | Gene Expression Omnibus |
| TCGA | The Cancer Genome Atlas |
| DEA | Differential Expression Analysis |
| DEGs | Differentially Expressed Genes |
| ML | Machine Learning |
| WGCNA | Weighted Gene Co-expression Network Analysis |
| PPI | Protein–Protein Interaction |
| SHAP | SHapley Additive exPlanations |
| PCA | Principal Component Analysis |
| FDR | False Discovery Rate |
| FC | Fold Change |
| ROC | Receiver Operating Characteristic |
| AUC | Area Under the Curve |
| GO | Gene Ontology |
| KEGG | Kyoto Encyclopedia of Genes and Genomes |
| SMR | Summary-based Mendelian Randomization |
| GWAS | Genome-Wide Association Study |
| eQTL | Expression Quantitative Trait Loci |
| HEIDI | Heterogeneity in Dependent Instruments |
| miRNA | MicroRNA |
| CIBERSORT | Cell-type Identification By Estimating Relative Subsets Of RNA Transcripts |
| DSigDB | Drug Signatures Database |
| ADMET | Absorption, Distribution, Metabolism, Excretion, and Toxicity |
| QED | Quantitative Estimate of Drug-likeness |
| RMSD | Root-Mean-Square Deviation |
| PLIP | Protein–Ligand Interaction Profiler |
| CHEK1 | Checkpoint Kinase 1 |
| KIF23 | Kinesin Family Member 23 |
References
- Gu, H.; Wang, R.; Beeraka, N.M.; Lu, P.; Zhao, X.; Li, L.; Liu, Y.; Kandula, D.R.; Fan, R.; Gayathri, T.; et al. Global burden and trends of breast cancer: GLOBOCAN 2022 estimates of incidence and mortality in 185 countries. Chin. Med. J. 2026, 139, 404–414. [Google Scholar] [CrossRef] [PubMed]
- Loibl, S.; Poortmans, P.; Morrow, M.; Denkert, C.; Curigliano, G. Breast cancer. Lancet 2021, 397, 1750–1769. [Google Scholar] [CrossRef] [PubMed]
- Siegel, R.L.; Giaquinto, A.N.; Jemal, A. Cancer statistics, 2024. CA Cancer J. Clin. 2024, 74, 12–49. [Google Scholar] [CrossRef] [PubMed]
- Arnold, M.; Morgan, E.; Rumgay, H.; Mafra, A.; Singh, D.; Laversanne, M.; Vignat, J.; Gralow, J.R.; Cardoso, F.; Siesling, S.; et al. Current and future burden of breast cancer: Global statistics for 2020 and 2040. Breast 2022, 66, 15–23. [Google Scholar] [CrossRef]
- Xu, H.; Xu, B. Breast cancer: Epidemiology, risk factors and screening. Chin. J. Cancer Res. 2023, 35, 565–583. [Google Scholar] [CrossRef]
- Dvir, K.; Giordano, S.; Leone, J.P. Immunotherapy in Breast Cancer. Int. J. Mol. Sci. 2024, 25, 7517. [Google Scholar] [CrossRef]
- Ye, F.; Dewanjee, S.; Li, Y.; Jha, N.K.; Chen, Z.-S.; Kumar, A.; Vishakha; Behl, T.; Jha, S.K.; Tang, H. Advancements in clinical aspects of targeted therapy and immunotherapy in breast cancer. Mol. Cancer 2023, 22, 105. [Google Scholar] [CrossRef]
- Lee, Y.; Ni, J.; Beretov, J.; Wasinger, V.C.; Graham, P.; Li, Y. Recent advances of small extracellular vesicle biomarkers in breast cancer diagnosis and prognosis. Mol. Cancer 2023, 22, 33. [Google Scholar] [CrossRef]
- Khan, A.Q.; Touseeq, M.; Rehman, S.; Tahir, M.; Ashfaq, M.; Jaffar, E.; Abbasi, S.F. Advances in breast cancer diagnosis: A comprehensive review of imaging, biosensors, and emerging wearable technologies. Front. Oncol. 2025, 15, 1587517. [Google Scholar] [CrossRef]
- Ahn, J.S.; Shin, S.; Yang, S.-A.; Park, E.K.; Kim, K.H.; Cho, S.I.; Ock, C.-Y.; Kim, S. Artificial Intelligence in Breast Cancer Diagnosis and Personalized Medicine. J. Breast Cancer 2023, 26, 405–435. [Google Scholar] [CrossRef]
- Zhou, S.; Hu, C.; Wei, S.; Yan, X. Breast Cancer Prediction Based on Multiple Machine Learning Algorithms. Technol. Cancer Res. Treat. 2024, 23, 15330338241234791. [Google Scholar] [CrossRef] [PubMed]
- Islam, T.; Sheakh, A.; Tahosin, M.S.; Hena, M.H.; Akash, S.; Bin Jardan, Y.A.; FentahunWondmie, G.; Nafidi, H.-A.; Bourhia, M. Predictive modeling for breast cancer classification in the context of Bangladeshi patients by use of machine learning approach with explainable AI. Sci. Rep. 2024, 14, 8487. [Google Scholar] [CrossRef] [PubMed]
- Hussain, S.; Ali, M.; Naseem, U.; Nezhadmoghadam, F.; Jatoi, M.A.; Gulliver, T.A.; Tamez-Peña, J.G. Breast cancer risk prediction using machine learning: A systematic review. Front. Oncol. 2024, 14, 1343627. [Google Scholar] [CrossRef] [PubMed]
- Królewska-Daszczyńska, P.; Englisz, A.; Morawiec, M.L.; Miśkiewicz, J.; Gołębski, M.; Mielczarek-Palacz, A. The assessment of breast cancer biomarkers in diagnosis, prognosis and treatment monitoring: Integrated analysis. J. Cancer Res. Clin. Oncol. 2025, 151, 233. [Google Scholar] [CrossRef]
- Kallah-Dagadu, G.; Mohammed, M.; Nasejje, J.B.; Mchunu, N.N.; Twabi, H.S.; Batidzirai, J.M.; Singini, G.C.; Nevhungoni, P.; Maposa, I. Breast cancer prediction based on gene expression data using interpretable machine learning techniques. Sci. Rep. 2025, 15, 7594. [Google Scholar] [CrossRef]
- Mirza, Z.; Ansari, S.; Iqbal, S.; Ahmad, N.; Alganmi, N.; Banjar, H.; Al-Qahtani, M.H.; Karim, S. Identification of Novel Diagnostic and Prognostic Gene Signature Biomarkers for Breast Cancer Using Artificial Intelligence and Machine Learning Assisted Transcriptomics Analysis. Cancers 2023, 15, 3237. [Google Scholar] [CrossRef]
- Gupta, S.; Gupta, M.K.; Shabaz, M.; Sharma, A. Deep learning techniques for cancer classification using microarray gene expression data. Front. Physiol. 2022, 13, 952709. [Google Scholar] [CrossRef]
- Clough, E.; Barrett, T.; Wilhite, E.S.; Ledoux, P.; Evangelista, C.; Kim, I.F.; Tomashevsky, M.; Marshall, A.K.; Phillippy, K.H.; Sherman, P.M.; et al. NCBI GEO: Archive for gene expression and epigenomics data sets: 23-year update. Nucleic Acids Res. 2024, 52, D138–D144. [Google Scholar] [CrossRef]
- Bülbül, A.; Gerdan, G.; Portakal, C.; Bajrami, S.; Boylu Akyerli, C. A Comprehensive Analysis of Novel Variations Associated with Bile Duct Cancer: Insights into Expression, Methylation, and 3D Protein Structure. Int. J. Mol. Sci. 2025, 26, 11244. [Google Scholar] [CrossRef]
- Amgad, M.; Hodge, J.M.; Elsebaie, M.A.T.; Bodelon, C.; Puvanesarajah, S.; Gutman, D.A.; Siziopikou, K.P.; Goldstein, J.A.; Gaudet, M.M.; Teras, L.R.; et al. A population-level digital histologic biomarker for enhanced prognosis of invasive breast cancer. Nat. Med. 2024, 30, 85–97. [Google Scholar] [CrossRef]
- Hijazo-Pechero, S.; Alay, A.; Cordero, D.; Marín, R.; Vilariño, N.; Palmero, R.; Brenes, J.; Montalban-Casafont, A.; Nadal, E.; Solé, X. Transcriptional analysis of landmark molecular pathways in lung adenocarcinoma results in a clinically relevant classification with potential therapeutic implications. Mol. Oncol. 2024, 18, 453–470. [Google Scholar] [CrossRef] [PubMed]
- Xiong, Y.; Ma, Y.; Liu, K.; Lei, J.; Zhao, J.; Zhu, J.; Wang, W.; Wen, M.; Wang, X.; Sun, Y.; et al. A gene-based score for the risk stratification of stage IA lung adenocarcinoma. Respir. Res. 2024, 25, 18. [Google Scholar] [CrossRef] [PubMed]
- Bai, L.; Li, Z.; Tang, C.; Song, C.; Hu, F. Hypergraph-based analysis of weighted gene co-expression hypernetwork. Front. Genet. 2025, 16, 1560841. [Google Scholar] [CrossRef] [PubMed]
- He, H.; Wang, Y.; Kuang, H.; Tian, L.; Wang, Z.; Wang, L.; Lv, F.; Liu, Z.; Wu, W.; Zhang, Y. Prognostic value of brown adipocyte-related genes in colorectal cancer: A multi-omics and Mendelian randomization study. Discov. Oncol. 2025, 17, 110. [Google Scholar] [CrossRef]
- Chen, S.-J.; Liao, D.-L.; Chen, C.-H.; Wang, T.-Y.; Chen, K.-C. Construction and Analysis of Protein-Protein Interaction Network of Heroin Use Disorder. Sci. Rep. 2019, 9, 4980. [Google Scholar] [CrossRef]
- Li, Y.; Hu, Y.; Jiang, F.; Chen, H.; Xue, Y.; Yu, Y. Combining WGCNA and machine learning to identify mechanisms and biomarkers of ischemic heart failure development after acute myocardial infarction. Heliyon 2024, 10, e27165. [Google Scholar] [CrossRef]
- Noor, F.; Qamar, M.T.U. Comprehensive review and assessment of machine learning approaches for host-pathogen protein-protein interaction prediction. Brief. Bioinform. 2026, 27, bbag051. [Google Scholar] [CrossRef]
- Barton, M.; Lennox, B. Model stacking to improve prediction and variable importance robustness for soft sensor development. Digit. Chem. Eng. 2022, 3, 100034. [Google Scholar] [CrossRef]
- Nugraha, B.; Jnanashree, A.V.; Bauschert, T. A versatile XAI-based framework for efficient and explainable intrusion detection systems. Ann. Telecommun. 2025, 80, 1095–1120. [Google Scholar] [CrossRef]
- Wei, L.; Wang, Y.; Zhou, D.; Li, X.; Wang, Z.; Yao, G.; Wang, X. Bioinformatics analysis on enrichment analysis of potential hub genes in breast cancer. Transl. Cancer Res. 2021, 10, 2399–2408. [Google Scholar] [CrossRef]
- Warde-Farley, D.; Donaldson, S.L.; Comes, O.; Zuberi, K.; Badrawi, R.; Chao, P.; Franz, M.; Grouios, C.; Kazi, F.; Lopes, C.T.; et al. The GeneMANIA prediction server: Biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res. 2010, 38, W214–W220. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Katsaros, D.; Wang, J.; Biglio, N.; Hernandez, B.Y.; Fei, P.; Lu, L.; Risch, H.; Yu, H. Machine learning-based cluster analysis of immune cell subtypes and breast cancer survival. Sci. Rep. 2023, 13, 18962. [Google Scholar] [CrossRef] [PubMed]
- Wang, Q.; Yang, Y. Bioinformatics analysis of effective biomarkers and immune infiltration in type 2 diabetes with cognitive impairment and aging. Sci. Rep. 2024, 14, 23279. [Google Scholar] [CrossRef] [PubMed]
- Krishnamoorthy, S.; Li, G.H.; Cheung, C.L. Transcriptome-wide summary data-based Mendelian randomization analysis reveals 38 novel genes associated with severe COVID-19. J. Med. Virol. 2023, 95, e28162. [Google Scholar] [CrossRef]
- Chen, Q.; Wang, G.; Zhao, R.; Hu, Q.; Li, J.; Wang, Y.; Ran, J.; Huang, Q.; Yu, G.; Luo, Y.; et al. TIMP4 as a Potential Complementary Biomarker and Therapeutic Target in Membranous Nephropathy: A Multi-Omics Investigation with Clinical Validation. J. Inflamm. Res. 2025, 18, 14755–14769. [Google Scholar] [CrossRef]
- Ma, Q.; Gao, J.; Hui, Y.; Zhang, Z.-M.; Qiao, Y.-J.; Yang, B.-F.; Gong, T.; Zhao, D.-M.; Huang, B.-R. Single-cell RNA-sequencing and genome-wide Mendelian randomisation along with abundant machine learning methods identify a novel B cells signature in gastric cancer. Discov. Oncol. 2025, 16, 11. [Google Scholar] [CrossRef]
- Lun, A.; Andrews, J.M.; Dundar, F.; Bunis, D. Using SingleR to annotate single-cell RNA-seq data. DIM 2020, 19363, 713. [Google Scholar]
- Li, X.-Y.; Wang, S.-L.; Chen, D.-H.; Liu, H.; You, J.-X.; Su, L.-X.; Yang, X.-T. Construction and Validation of a m7G-Related Gene-Based Prognostic Model for Gastric Cancer. Front. Oncol. 2022, 12, 861412. [Google Scholar] [CrossRef]
- Bolcato, G. Combinare Approcci Tempo-Dipendenti e Tempo-Indipendenti Nella Scoperta di Farmaci. Ph.D. Thesis, University of Padua, Padua, Italy, 2022. [Google Scholar]
- Schake, P.; Bolz, S.N.; Linnemann, K.; Schroeder, M. PLIP 2025: Introducing protein–protein interactions to the protein–ligand interaction profiler. Nucleic Acids Res. 2025, 53, W463–W465. [Google Scholar] [CrossRef]
- Kang, Y.; Chang, L.; Lin, H.; Liu, J.; Wang, C.; Lu, H.; Xu, Q. Development and external validation of a machine learning-based prognostic model for small cell neuroendocrine cervical carcinoma: A multi-center study. BMC Cancer 2025, 25, 1926. [Google Scholar] [CrossRef]
- Nie, F.; Song, X.; Chen, W. DRPM: An advanced predictive model for early diabetes detection and risk stratification. Mol. Ther. Nucleic Acids 2025, 36, 102576. [Google Scholar] [CrossRef]
- Rundle, S.; Bradbury, A.; Drew, Y.; Curtin, N.J. Targeting the ATR-CHK1 Axis in Cancer Therapy. Cancers 2017, 9, 41. [Google Scholar] [CrossRef] [PubMed]
- Dedes, K.J.; Wilkerson, P.M.; Wetterskog, D.; Weigelt, B.; Ashworth, A.; Reis-Filho, J.S. Synthetic lethality of PARP inhibition in cancers lacking BRCA1 and BRCA2 mutations. Cell Cycle 2011, 10, 1192–1199. [Google Scholar] [CrossRef] [PubMed]
- Jian, W.; Deng, X.C.; Munankarmy, A.; Borkhuu, O.; Ji, C.L.; Wang, X.H.; Fang, L. KIF23 promotes triple negative breast cancer through activating epithelial-mesenchymal transition. Gland Surg. 2021, 10, 1941. [Google Scholar] [CrossRef] [PubMed]
- Kumar, N.; Yadav, P.; Kumar, A.; Beniwal, S.; Kapoor, A.; Kalwar, A. DNA damage ATR/Chk1 checkpoint signalling increases PD-L1 immune checkpoint activation and its implication for personalised combination therapy. Ann. Oncol. 2018, 29, vi18. [Google Scholar] [CrossRef]
- Almenar-Pérez, E.; Sarría, L.; Nathanson, L.; Oltra, E. Assessing diagnostic value of microRNAs from peripheral blood mononuclear cells and extracellular vesicles in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome. Sci. Rep. 2020, 10, 2064. [Google Scholar] [CrossRef]
- Tozaki, Y.; Aoki, H.; Kato, R.; Toriuchi, K.; Arame, S.; Inoue, Y.; Hayashi, H.; Kubota, E.; Kataoka, H.; Aoyama, M. The Combination of ATM and Chk1 Inhibitors Induces Synthetic Lethality in Colorectal Cancer Cells. Cancers 2023, 15, 735. [Google Scholar] [CrossRef]
- Li, J.M.; Nong, J.M.; Huang, X.Y.; Liu, Q.B.; Liu, Y.Y.M.; Sun, J.C.; Zhu, W.Q.M.; Xie, S.M. Predicting the ferroptosis-associated gene targets in atherosclerosis by integrating GWAS and eQTL studies summary data. Medicine 2025, 104, e42846. [Google Scholar] [CrossRef]
- Tomanelli, M. Unveiling Heterogeneity in Neuroblastoma Cell-Line Using Single-Cell RNA-Sequencing. Doctoral Thesis, University of Genoa, Genoa, Italy, 2025. [Google Scholar]
- Rose, M.; Burgess, J.T.; O’Byrne, K.; Richard, D.J.; Bolderson, E. PARP Inhibitors: Clinical Relevance, Mechanisms of Action and Tumor Resistance. Front. Cell Dev. Biol. 2020, 8, 564601. [Google Scholar] [CrossRef]
- Sesma Sanz, L. Targeting DNA Repair Deficiencies with Small Molecule Drugs for Cancer Treatment. Ph.D. Thesis, Université Laval, Quebec City, QC, Canada, 2021. [Google Scholar]
- Smith, H.L.; Prendergast, L.; Curtin, N.J. Exploring the Synergy between PARP and CHK1 Inhibition in Matched BRCA2 Mutant and Corrected Cells. Cancers 2020, 12, 878. [Google Scholar] [CrossRef]
- Kurosu, T.; Nagao, T.; Wu, N.; Oshikawa, G.; Miura, O. Inhibition of the PI3K/Akt/GSK3 Pathway Downstream of BCR/ABL, Jak2-V617F, or FLT3-ITD Downregulates DNA Damage-Induced Chk1 Activation as Well as G2/M Arrest and Prominently Enhances Induction of Apoptosis. PLoS ONE 2013, 8, e79478. [Google Scholar] [CrossRef]












| Compound Name | Smiles | PDB | Pocket Center | Pocket Size | Affinity (kcal/mol) |
|---|---|---|---|---|---|
| Olaparib | C1CC1C(=O)N2CCN(CC2)C(=O)C3=C(C=CC(=C3)CC4=NNC(=O)C5=CC=CC=C54)F | 3D94 | [25.161499, 18.3265, −4.3955] | [10.0910015, 15.862999, 18.397] | −10.2 |
| Olaparib | C1CC1C(=O)N2CCN(CC2)C(=O)C3=C(C=CC(=C3)CC4=NNC(=O)C5=CC=CC=C54)F | 3O23 | [7.4795, 1.2415001, 21.691] | [11.247, 18.867, 9.73] | −10.1 |
| LY 294002 | C1COCCN1C2=CC(=O)C3=C(O2)C(=CC=C3)C4=CC=CC=C4 | 5FXS | [8.952001, −4.277, 51.9955] | [19.462002, 10.378, 10.753002] | −9.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, Z.; Hou, J.; Chen, Y.; Li, J.; Vengusamy, S. Integrated Multi-Omics and Machine Learning Framework Identifies Diagnostic Signatures and Druggable Targets in Breast Cancer. Genes 2026, 17, 396. https://doi.org/10.3390/genes17040396
Wang Z, Hou J, Chen Y, Li J, Vengusamy S. Integrated Multi-Omics and Machine Learning Framework Identifies Diagnostic Signatures and Druggable Targets in Breast Cancer. Genes. 2026; 17(4):396. https://doi.org/10.3390/genes17040396
Chicago/Turabian StyleWang, Zifu, Jinqi Hou, Yimin Chen, Jundi Li, and Sivakumar Vengusamy. 2026. "Integrated Multi-Omics and Machine Learning Framework Identifies Diagnostic Signatures and Druggable Targets in Breast Cancer" Genes 17, no. 4: 396. https://doi.org/10.3390/genes17040396
APA StyleWang, Z., Hou, J., Chen, Y., Li, J., & Vengusamy, S. (2026). Integrated Multi-Omics and Machine Learning Framework Identifies Diagnostic Signatures and Druggable Targets in Breast Cancer. Genes, 17(4), 396. https://doi.org/10.3390/genes17040396

