Machine Learning for Multi-Omics Characterization of Blood Cancers: A Systematic Review
Abstract
1. Introduction
1.1. Rationale
- Genomic Alterations are features such as driver mutations, chromosomal aberrations, and copy number variations that define disease subtypes;
- Transcriptomic Signatures elucidate patterns associated with prognosis, treatment response, and biological pathways;
- Proteomic Profiles are patterns in protein expression singly and at a network level for post-translational modifications, and signaling functions;
- Metabolomic Patterns are an analysis of metabolic features as biomarkers for disease monitoring;
- Epigenomic Landscapes entail the investigation of the impact of copy number variation, DNA methylation, histone modifications, and chromatin accessibility patterns [10].
- Explainability and Interpretability—how effectively models can elucidate molecular mechanisms and enable the discovery of actionable biomarkers from complex omics datasets;
- Performance, Reproducibility, and Validation—strategies for robust molecular discovery and validation across independent datasets;
- Ethical Considerations—ncluding bias mitigation in molecular datasets, genomic privacy, and equitable access to precision medicine.
1.2. Objectives
- Evaluation of explainability and interpretability of AI/ML models in molecular discovery;
- Assessment of reproducibility and validation practices across studies;
- Identification of ethical considerations and bias mitigation strategies;
- Determination of gaps in current research and priority areas for future investigation;
- Assessment of the integration of multiple omics layers and their impact on performance.
1.3. Research Questions to Frame the Systematic Review
- What AI/ML methodologies have been applied to the molecular characterization of hematological malignancies?
- How do these methods perform in terms of diagnostic accuracy, prognostic capability, and molecular biomarker discovery?
- What validation strategies have been employed to ensure reproducibility and generalizability?
- How much do current approaches address explainability and clinical interpretability?
- What good practices around ethical considerations and bias mitigation strategies are used?
2. Systematic Review Methods
2.1. Protocol
2.2. Eligibility Criteria
2.2.1. Inclusion Criteria
- Studies applying AI/ML methods to the molecular characterization of hematological malignancies;
- Use of omics data (genomics, transcriptomics, proteomics, metabolomics, epigenomics);
- Peer-reviewed articles published between 1 January 2015 and 31 December 2024;
- Studies reporting quantitative performance metrics (sensitivity, specificity, AUC, accuracy);
- English language publications;
- Human studies with clearly defined hematological malignancy populations.
2.2.2. Exclusion Criteria
- Reviews, editorials, conference abstracts, and case reports;
- Studies focusing solely on medical imaging without molecular data;
- Studies without clear AI/ML methodology description;
- Non-hematological malignancies;
- Studies with insufficient data for quality assessment;
- Preclinical studies using only cell lines or animal models.
2.3. Information Sources and Search Strategy
- PubMed/MEDLINE (1946 to December 2024);
- Embase (1947 to December 2024);
- IEEE Xplore Digital Library (1963 to December 2024);
- Web of Science Core Collection (1900 to December 2024).
Search Strategy Example (PubMed)
2.4. Study Selection Process
2.5. Data Collection Process
2.5.1. Study Characteristics Selected in Methodology
- Author, year, country, study design;
- Sample size, patient population demographics;
- Hematological malignancy type and subtype.
2.5.2. Methodological Characteristics
- AI/ML algorithm type and implementation details;
- Explainability through feature selection methods;
- Training/validation methodology and cross-validation strategy;
- Performance metrics reported and evaluation methods.
2.5.3. Outcome Measures
- Diagnostic accuracy (sensitivity, specificity, AUC, PPV, NPV);
- Prognostic performance (C-index, time-dependent AUC, hazard ratios);
- Molecular biomarker identification and validation;
- Clinical utility assessment and decision curve analysis.
2.5.4. Quality Indicators
- External validation performed (yes/no, type);
- Explainability methods used;
- Ethical considerations addressed;
- Bias mitigation strategies employed;
- Data availability and code sharing.
2.6. Quality Assessment
2.6.1. Risk of Bias Domains
- Patient Selection Bias: Representative spectrum, appropriate exclusions;
- Reference Standard Bias: Appropriate reference standard, blinded interpretation;
- Index Test Bias: Appropriate AI/ML methodology, pre-specified thresholds;
- Flow and Timing Bias: Appropriate interval, same reference standard;
- AI-specific Bias: Overfitting prevention, data leakage, validation strategy.
2.6.2. Applicability Concerns
- Patient Selection Applicability: Match between study population and clinical setting;
- Index Test Applicability: Match between AI/ML implementation and clinical use;
- Reference Standard Applicability: Match between study reference and clinical practice;
- Each domain was rated as low, high, or unclear risk of bias by two independent reviewers.
2.7. Synthesis Methods
- AI/ML methodology type (traditional ML vs. deep learning);
- Hematological malignancy subtype;
- Validation strategy employed (internal vs. external);
- Performance metrics and clinical utility measures.
3. Results
3.1. Evolution of AIML Approaches for Multi-Omics Molecular Analysis
3.1.1. Early Statistical Approaches in Molecular Discovery
3.1.2. Transition to ML for Molecular Pattern Recognition
3.1.3. Deep Learning for Complex Molecular Interaction Modeling
- They do not provide generalized solutions for molecular discovery, that is, they do not predict well for new unseen cases. Thus, these approaches do not provide reproducible molecular findings or broader solutions representing diverse populations.
- They have low explainability for molecular mechanisms. They cannot be easily interrogated to understand biological relationships between molecular features and disease outcomes, limiting their utility for mechanistic discovery and therapeutic target identification.
- They have significant redundancy and require significant computing power to train on omics data. This has a significant negative environmental impact and a large carbon footprint.
3.2. Molecular Landscape of Hematological Malignancies
3.2.1. Genomic Alterations and Molecular Subtypes
3.2.2. Transcriptomic Signatures and Pathway Dysregulation
3.2.3. Proteomic Profiles and Functional Networks
3.3. Study Selection
PRISMA Flow Diagram
3.4. Study Characteristics
3.4.1. Geographic and Temporal Distribution
3.4.2. Hematological Malignancy Types
3.4.3. Sample Size Distribution
3.5. AI/ML Methodologies and Implementation
3.6. Performance Outcomes
3.6.1. Diagnostic Performance
- Median AUC: 0.87 (IQR: 0.81–0.94);
- Median Sensitivity: 85.2% (IQR: 78.4–91.6%);
- Median Specificity: 83.7% (IQR: 76.9–89.3%);
- Median Accuracy: 84.5% (IQR: 79.1–90.2%).
3.6.2. Prognostic Performance
- Median C-index: 0.73 (IQR: 0.68–0.81);
- Median HR for high-risk group: 2.34 (IQR: 1.87–3.12);
- Median time-dependent AUC (5-year): 0.76 (IQR: 0.71–0.83).
3.7. Validation Strategies and Reproducibility
3.8. Biological Interpretation of Results
- Signal Transduction Pathways: Dysregulated apoptosis and cell cycle pathways identified across multiple malignancy types;
- Metabolic Reprogramming: ML-identified metabolic pathway alterations associated with drug resistance;
- Microenvironment Interactions: Single-cell multi-omics studies revealed cellular communication networks.
3.8.1. Commentary on Molecular Insights and Biomarker Discovery
Genomics-Based Molecular Discovery
Proteomics-Driven Molecular Characterization
Multi-Omics Integration for Comprehensive Molecular Profiling
3.9. Explainability and Interpretability
3.9.1. Commentary and Insights Drawn on Explainability and Interpretability of ML Approaches
Interpretable Models for Molecular Mechanism Elucidation
Deep Learning Interpretability in Molecular Applications
Balancing Model Complexity and Molecular Interpretability
3.10. Ethical Considerations and Bias Assessment
3.10.1. Demographic Representation of the Studies Considered
- Studies reporting ethnicity: 34 studies (38.2%);
- Predominantly European ancestry: 67 studies (75.3%);
- Multi-ethnic cohorts: 18 studies (20.2%);
- Age distribution reported: 76 studies (85.4%);
- Sex-stratified analysis: 45 studies (50.6%).
3.10.2. Commentary on Ethical Considerations in Molecular Medicine Applications
Genomic Privacy and Data Security
Bias Mitigation in Molecular Datasets
3.11. Clinical Translation and Utility
3.11.1. Commentary on Clinical Translation and Molecular Medicine Integration
Molecular Biomarker Validation and Clinical Utility
Precision Medicine Through Molecular Stratification
3.12. Future Directions in Molecular AIML Applications
3.12.1. Advanced Multi-Omics Integration Technologies
3.12.2. Standardization Frameworks for Discovery Validation
3.12.3. Synthetic Molecular Data and Digital Twins
4. Discussion
4.1. Summary of Evidence
4.2. Research Implications and Future Directions
4.3. Technological Advances and Implementation Science
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Glossary
Artificial Intelligence (AI) | Computer systems designed to perform tasks that usually require human intelligence, such as recognizing patterns, making decisions, or interpreting data. |
Machine Learning (ML) | A subset of AI that allows computers to learn from data and improve their performance over time without being explicitly programmed. |
Deep Learning | A complex type of ML using layers of algorithms (‘neural networks’) to detect intricate patterns in data. |
Neural Networks | A set of algorithms modeled after the human brain that are designed to recognize patterns and relationships in data. |
Hematological Malignancies | Cancers that begin in blood-forming tissue, such as leukemia, lymphoma, and multiple myeloma. |
Molecular Characterization | The process of identifying the unique molecular features (like genes, proteins, or metabolites) of a disease. |
Omics | A broad term for fields of biological study ending in ‘-omics’—such as genomics (genes), proteomics (proteins), and metabolomics (metabolites). |
Multi-Omics Integration | Combining data from multiple ‘omics’ levels (e.g., genes, proteins, metabolites) to achieve a complete view of disease biology. |
Genomics | The study of an organism’s complete set of DNA, including all its genes. |
Transcriptomics | The study of RNA molecules to understand which genes are actively being expressed. |
Proteomics | The large-scale study of proteins and how they function in the body. |
Metabolomics | The study of small molecules (metabolites) produced during metabolism, providing clues about disease states. |
Biomarker | A measurable indicator (like a protein or gene mutation) that helps detect or predict disease. |
Explainability/Interpretability | The degree to which the inner workings of a ML model can be understood and interpreted by humans. |
Validation (Internal/External) | Processes to test how well a model’s predictions hold up on new, unseen data (internal uses the same dataset, external uses new data). |
Overfitting | A problem where a model fits the training data too closely, capturing noise rather than the underlying pattern, and performs poorly on new data. |
False Discovery | An incorrect identification of a feature or result as significant when it is not, often due to multiple testing or noise in the data. |
AUC (Area Under the Curve) | A score from 0 to 1 indicating how well a model distinguishes between different disease states. Higher is better. |
Sensitivity and Specificity | Sensitivity measures the true positive rate; specificity measures the true negative rate. |
C-index | A metric used in prognosis models to evaluate how well predicted risks match actual outcomes over time. |
Cross-validation | A method to evaluate model performance by partitioning data into training and testing sets multiple times. |
SHAP (SHapley Additive exPlanations) | A tool used to explain how much each feature contributes to a model’s output. |
LIME | A method that explains individual predictions of complex models in a human-understandable way. |
Bias Mitigation | Efforts to prevent AI models from producing unfair results due to biases in the training data. |
Federated Learning | A privacy-preserving approach where data stays at its source and only model updates are shared. |
Digital Twin | A digital replica of a biological system used to simulate and predict disease progression or treatment response. |
Principal Components Analysis (PCA) | A method for reducing the dimensionality of data while preserving trends and patterns. |
t-distributed Stochastic Neighbor Embedding (t-SNE) | A technique for visualizing high-dimensional data in a way that makes patterns easier to see. |
Random Forests (RF) | An ensemble learning method using many decision trees to improve prediction accuracy and control overfitting. |
Support Vector Machines (SVMs) | A supervised ML algorithm that finds the best boundary (hyperplane) between data classes. |
Decision Trees | A model that splits data into branches to reach a decision or classification based on input features. |
Naive Bayes | A classification method based on Bayes’ theorem, assuming independence between predictors. |
Ensemble Methods | Techniques that combine multiple models (like trees and SVMs) to improve overall performance. |
Logistic Regression | A statistical model used to predict the probability of a binary outcome (e.g., disease vs. no disease). |
Generalization (in ML) | The ability of a model to perform well on new, unseen data—not just the data it was trained on. |
Ethical AI | Developing AI systems that are fair, transparent, and protect individual rights and privacy. |
References
- Sorokin, M.; Borisov, N.; Kuzmin, D.; Gudkov, A.; Zolotovskaia, M.; Garazha, A.; Buzdin, A. Algorithmic Annotation of Functional Roles for Components of 3044 Human Molecular Pathways. Front. Genet. 2021, 12, 617059. [Google Scholar] [CrossRef]
- Bourke, M.; McInerney-Leo, A.; Steinberg, J.; Boughtwood, T.; Milch, V.; Ross, A.L.; Ambrosino, E.; Dalziel, K.; Franchini, F.; Huang, L.; et al. The Cost Effectiveness of Genomic Medicine in Cancer Control: A Systematic Literature Review. Appl. Health Econ. Health Policy 2025, 23, 359–393. [Google Scholar] [CrossRef] [PubMed]
- Walter, W.; Pohlkamp, C.; Meggendorfer, M.; Nadarajah, N.; Kern, W.; Haferlach, C.; Haferlach, T. Artificial intelligence in hematological diagnostics: Game changer or gadget? Blood Rev. 2023, 58, 101019. [Google Scholar] [CrossRef] [PubMed]
- Elshoeibi, A.M.; Badr, A.; Elsayed, B.; Metwally, O.; Elshoeibi, R.; Elhadary, M.R.; Elshoeibi, A.; Attya, M.A.; Khadadah, F.; Alshurafa, A.; et al. Integrating AI and ML in Myelodysplastic Syndrome Diagnosis: State-of-the-Art and Future Prospects. Cancers 2023, 16, 65. [Google Scholar] [CrossRef]
- Tsagiopoulou, M.; Gut, I.G. Machine learning and multi-omics data in chronic lymphocytic leukemia: The future of precision medicine? Front. Genet. 2024, 14, 1304661. [Google Scholar] [CrossRef]
- Obeagu, E.I.; Ezeanya, C.U.; Ogenyi, F.C.; Ifu, D.D. Big data analytics and machine learning in hematology: Transformative insights, applications and challenges. Medicine 2025, 104, e41766. [Google Scholar] [CrossRef]
- Obstfeld, A.E. Hematology and machine learning. J. Appl. Lab. Med. 2023, 8, 129–144. [Google Scholar] [CrossRef]
- Broadhurst, D.I.; Kell, D.B. Statistical strategies for avoiding false discoveries in metabolomics and related experiments. Metabolomics 2006, 2, 171–196. [Google Scholar] [CrossRef]
- Albaradei, S.; Thafar, M.; Alsaedi, A.; Van Neste, C.; Gojobori, T.; Essack, M.; Gao, X. Machine learning and deep learning methods that use omics data for metastasis prediction. Comput. Struct. Biotechnol. J. 2021, 19, 5008–5018. [Google Scholar] [CrossRef]
- Andrades, A.; Peinado, P.; Alvarez-Perez, J.C.; Sanjuan-Hidalgo, J.; García, D.J.; Arenas, A.M.; Matia-González, A.M.; Medina, P.P. SWI/SNF complexes in hematological malignancies: Biological implications and therapeutic opportunities. Mol. Cancer 2023, 22, 39. [Google Scholar] [CrossRef]
- Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
- Sounderajah, V.; Ashrafian, H.; Golub, R.M.; Shetty, S.; De Fauw, J.; Hooft, L.; Moons, K.; Collins, G.; Moher, D.; Bossuyt, P.M.; et al. Developing a reporting guideline for artificial intelligence-centred diagnostic test accuracy studies: The STARD-AI protocol. BMJ Open 2021, 11, e047709. [Google Scholar] [CrossRef] [PubMed]
- Franceschi, P.; Giordan, M.; Wehrens, R. Multiple comparisons in mass-spectrometry-based-omics technologies. TrAC Trends Anal. Chem. 2013, 50, 11–21. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V.; Saitta, L. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Dudoit, S.; Fridlyand, J.; Speed, T.P. Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 2002, 97, 77–87. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Rajkomar, A.; Dean, J.; Kohane, I. Machine learning in medicine. N. Engl. J. Med. 2019, 380, 1347–1358. [Google Scholar] [CrossRef]
- Arber, D.A.; Orazi, A.; Hasserjian, R.P.; Borowitz, M.J.; Calvo, K.R.; Kvasnicka, H.-M.; Wang, S.A.; Bagg, A.; Barbui, T.; Branford, S.; et al. International Consensus Classification of Myeloid Neoplasms and Acute Leukemias: Integrating morphologic, clinical, and genomic data. Blood 2022, 140, 1200–1228. [Google Scholar] [CrossRef]
- Lee, J.; Cho, S.; Hong, S.-E.; Kang, D.; Choi, H.; Lee, J.-M.; Yoon, J.-H.; Cho, B.-S.; Lee, S.; Kim, H.-J.; et al. Integrative analysis of gene expression data by RNA sequencing for differential diagnosis of acute leukemia: Potential application of machine learning. Front. Oncol. 2021, 11, 717616. [Google Scholar] [CrossRef] [PubMed]
- Jiménez, C.; Garrote-De-Barros, A.; López-Portugués, C.; Hernández-Sánchez, M.; Díez, P. Characterization of Human B Cell Hematological Malignancies Using Protein-Based Approaches. Int. J. Mol. Sci. 2024, 25, 4644. [Google Scholar] [CrossRef] [PubMed]
- Dunphy, K.; O’mahoney, K.; Dowling, P.; O’gorman, P.; Bazou, D. Clinical Proteomics of Biofluids in Haematological Malignancies. Int. J. Mol. Sci. 2021, 22, 8021. [Google Scholar] [CrossRef] [PubMed]
- Wagner, S.; Vadakekolathu, J.; Tasian, S.K.; Altmann, H.; Bornhäuser, M.; Pockley, A.G.; Ball, G.R.; Rutella, S. A parsimonious 3-gene signature predicts clinical outcomes in an acute myeloid leukemia multicohort study. Blood Adv. 2019, 3, 1330–1346. [Google Scholar] [CrossRef]
- Katsenou, A.; O’farrell, R.; Dowling, P.; Heckman, C.A.; O’gorman, P.; Bazou, D. Using proteomics data to identify personalized treatments in multiple myeloma: A machine learning approach. Int. J. Mol. Sci. 2023, 24, 15570. [Google Scholar] [CrossRef]
- Deeb, S.J.; Tyanova, S.; Hummel, M.; Schmidt-Supprian, M.; Cox, J.; Mann, M. Machine Learning-based Classification of Diffuse Large B-cell Lymphoma Patients by Their Protein Expression Profiles. Mol. Cell. Proteom. 2015, 14, 2947–2960. [Google Scholar] [CrossRef]
- Brazma, A.; Hingamp, P.; Quackenbush, J.; Sherlock, G.; Spellman, P.; Stoeckert, C.; Aach, J.; Ansorge, W.; Ball, C.A.; Causton, H.C.; et al. Minimum information about a microarray experiment (MIAME)—Toward standards for microarray data. Nat. Genet. 2001, 29, 365–371. [Google Scholar] [CrossRef]
- Field, D.; Garrity, G.; Gray, T.; Morrison, N.; Selengut, J.; Sterk, P.; Tatusova, T.; Thomson, N.; Allen, M.J.; Angiuoli, S.V.; et al. The minimum information about a genome sequence (MIGS) specification. Nat. Biotechnol. 2008, 26, 541–547. [Google Scholar] [CrossRef]
- Taylor, C.F.; Paton, N.; Lilley, K.S.; Binz, P.-A.; Julian, R.K., Jr.; Jones, A.; Zhu, W.; Apweiler, R.; Aebersold, R.; Deutsch, E.; et al. The minimum information about a proteomics experiment (MIAPE). Nat. Biotechnol. 2007, 25, 887–893. [Google Scholar] [CrossRef]
- Burd, A.; Levine, R.L.; Ruppert, A.S.; Mims, A.S.; Borate, U.; Stein, E.M.; Patel, P.; Baer, M.R.; Stock, W.; Deininger, M.; et al. Precision medicine treatment in acute myeloid leukemia using prospective genomic profiling: Feasibility and preliminary efficacy of the Beat AML Master Trial. Nat. Med. 2020, 26, 1852–1858. [Google Scholar] [CrossRef]
- Bolouri, H.; Farrar, J.E.; Triche, T.; Ries, R.E.; Lim, E.L.; Alonzo, T.A.; Ma, Y.; Moore, R.; Mungall, A.J.; Marra, M.A.; et al. The molecular landscape of pediatric acute myeloid leukemia reveals recurrent structural alterations and age-specific mutational interactions. Nat. Med. 2018, 24, 103–112. [Google Scholar] [CrossRef]
- Eckardt, J.-N.; Bornhäuser, M.; Wendt, K.; Middeke, J.M. Application of machine learning in the management of acute myeloid leukemia: Current practice and future prospects. Blood Adv. 2020, 4, 6077–6085. [Google Scholar] [CrossRef]
- Perillo, T.; de Giorgi, M.; Giorgio, C.; Frasca, C.; Cuocolo, R.; Pinto, A. The Role of Machine Learning in the Most Common Hematological Malignancies: A Narrative Review. Hemato 2024, 5, 380–387. [Google Scholar] [CrossRef]
- Lipton, Z.C. The mythos of model interpretability. Commun. ACM 2018, 61, 36–43. [Google Scholar] [CrossRef]
- Samek, W.; Wiegand, T.; Müller, K.R. Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models. ITU J. ICT Discov. 2017, 1, 39–48. Available online: https://www.itu.int/en/journal/001/Pages/05.aspx (accessed on 25 August 2025).
- Hossain, M.A.; Islam, A.K.M.M.; Islam, S.; Shatabda, S.; Ahmed, A. Symptom based explainable artificial intelligence model for leukemia detection. IEEE Access 2022, 10, 57283–57298. [Google Scholar] [CrossRef]
- Gimeno, M.; José-Enériz, E.S.; Villar, S.; Agirre, X.; Prosper, F.; Rubio, A.; Carazo, F. Explainable artificial intelligence for precision medicine in acute myeloid leukemia. Front. Immunol. 2022, 13, 977358. [Google Scholar] [CrossRef] [PubMed]
- Yang, J.; Liu, X.; Zhong, Q.-Z.; Yang, Y.; Wu, T.; Chen, S.-Y.; Chen, B.; Song, Y.-W.; Fang, H.; Wang, S.-L.; et al. Disparities in mortality risk after diagnosis of hematological malignancies in 185 countries: A global data analysis. Cancer Lett. 2024, 595, 216793. [Google Scholar] [CrossRef]
- Almeida, J.M.; Castro, G.A.; Machado-Neto, J.A.; Almeida, T.A. An explainable model to support the decision about the therapy protocol for AML. In Brazilian Conference on Intelligent Systems; Springer: Cham, Switzerland, 2023; pp. 431–446. [Google Scholar] [CrossRef]
- Chen, J.; Xiong, J.; Wang, Y.; Xin, Q.; Zhou, H. Implementation of an AI-based MRD evaluation and prediction model for multiple myeloma. Front. Comput. Intell. Syst. 2024, 6, 127–131. [Google Scholar] [CrossRef]
- Jiang, X.; Hu, Z.; Wang, S.; Zhang, Y. Deep Learning for Medical Image-Based Cancer Diagnosis. Cancers 2023, 15, 3608. [Google Scholar] [CrossRef]
- Srisuwananukorn, A.; Salama, M.E.; Pearson, A.T. Deep learning applications in visual data for benign and malignant hematologic conditions: A systematic review and visual glossary. Haematologica 2023, 108, 1993–2010. [Google Scholar] [CrossRef] [PubMed]
- Manescu, P.; Narayanan, P.; Bendkowski, C.; Elmi, M.; Claveau, R.; Pawar, V.; Brown, B.J.; Shaw, M.; Rao, A.; Fernandez-Reyes, D. Detection of acute promyelocytic leukemia in peripheral blood and bone marrow with annotation-free deep learning. Sci. Rep. 2023, 13, 2562. [Google Scholar] [CrossRef] [PubMed]
- Kotsyfakis, S.; Iliaki-Giannakoudaki, E.; Anagnostopoulos, A.; Papadokostaki, E.; Giannakoudakis, K.; Goumenakis, M.; Kotsyfakis, M. The application of machine learning to imaging in hematological oncology: A scoping review. Front. Oncol. 2022, 12, 1080988. [Google Scholar] [CrossRef] [PubMed]
- Morabito, F.; Adornetto, C.; Monti, P.; Amaro, A.; Reggiani, F.; Colombo, M.; Rodriguez-Aldana, Y.; Tripepi, G.; D’aRrigo, G.; Vener, C.; et al. Genes selection using deep learning and explainable artificial intelligence for chronic lymphocytic leukemia predicting the need and time to therapy. Front. Oncol. 2023, 13, 1198992. [Google Scholar] [CrossRef]
- Lancashire, L.J.; Powe, D.G.; Reis-Filho, J.S.; Rakha, E.; Lemetre, C.; Weigelt, B.; Abdel-Fatah, T.M.; Green, A.R.; Mukta, R.; Blamey, R.; et al. A validated gene expression profile for detecting clinical outcome in breast cancer using artificial neural networks. Breast Cancer Res. Treat. 2010, 120, 83–93. [Google Scholar] [CrossRef]
- Abdel-Fatah, T.M.A.; Agarwal, D.; Liu, D.-X.; Russell, R.; Rueda, O.M.; Liu, K.; Xu, B.; Moseley, P.M.; Green, A.R.; Pockley, A.G.; et al. SPAG5 as a prognostic biomarker and chemotherapy sensitivity predictor in breast cancer: A retrospective, integrated genomic, transcriptomic, and protein analysis. Lancet Oncol. 2016, 17, 1004–1018. [Google Scholar] [CrossRef]
- Harari, Y.N. Dataism is Our New God. New Perspect. Q. 2017, 34, 36–43. [Google Scholar] [CrossRef]
- Shouval, R.; Fein, J.A.; Savani, B.; Mohty, M.; Nagler, A. Machine learning and artificial intelligence in hematology. Br. J. Haematol. 2021, 192, 239–250. [Google Scholar] [CrossRef]
- Roberts, M.; Driggs, D.; Thorpe, M.; Gilbey, J.; Yeung, M.; Ursprung, S.; Aviles-Rivero, A.I.; Etmann, C.; McCague, C.; Beer, L.; et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat. Mach. Intell. 2021, 3, 199–217. [Google Scholar] [CrossRef]
- Walter, W.; Haferlach, C.; Nadarajah, N.; Schmidts, I.; Kühn, C.; Kern, W.; Haferlach, T. How artificial intelligence might disrupt diagnostics in hematology in the near future. Oncogene 2021, 40, 4271–4280. [Google Scholar] [CrossRef]
- Alhajahjeh, A.; Nazha, A. Unlocking the Potential of Artificial Intelligence in Acute Myeloid Leukemia and Myelodysplastic Syndromes. Curr. Hematol. Malig. Rep. 2024, 19, 9–17. [Google Scholar] [CrossRef]
- Wang, S.-X.; Huang, Z.-F.; Li, J.; Wu, Y.; Du, J.; Li, T. Optimization of diagnosis and treatment of hematological diseases via artificial intelligence. Front. Med. 2024, 11, 1487234. [Google Scholar] [CrossRef] [PubMed]
- Sobas, M.; Elicegui, J.M.; Ramiro, A.V.; González, T.; Hernandez-Sanchez, A.; Melchor, R.A.; Benner, A.; Sträng, E.; Gastone, C.; Heckman, C.A.; et al. Harmony Alliance provides a machine learning researching tool to predict the risk of relapse after first remission in AML patients treated without allogeneic haematopoietic stem cell transplantation. Blood 2021, 138, 4041. [Google Scholar] [CrossRef]
- Gal, O.; Auslander, N.; Fan, Y.; Meerzaman, D. Predicting complete remission of acute myeloid leukemia: Machine learning applied to gene expression. Cancer Inform. 2019, 18, 1–5. [Google Scholar] [CrossRef] [PubMed]
- Cheng, Z.J.; Li, H.; Liu, M.; Fu, X.; Liu, L.; Liang, Z.; Gan, H.; Sun, B. Artificial intelligence reveals the predictions of hematological indexes in children with acute leukemia. BMC Cancer 2024, 24, 993. [Google Scholar] [CrossRef] [PubMed]
- Meng, L.; Wei, T.; Fan, R.; Su, H.; Liu, J.; Wang, L.; Huang, X.; Qi, Y.; Li, X. Development and validation of a machine learning model to predict venous thromboembolism among hospitalized cancer patients. Asia-Pac. J. Oncol. Nurs. 2022, 9, 100128. [Google Scholar] [CrossRef]
- Zhang, H.; Qureshi, M.A.; Wahid, M.; Charifa, A.; Ehsan, A.; Ip, A.; De Dios, I.; Ma, W.; Sharma, I.; McCloskey, J.; et al. Differential diagnosis of hematologic and solid tumors using targeted transcriptome and artificial intelligence. Am. J. Pathol. 2023, 193, 51–59. [Google Scholar] [CrossRef]
- Syed-Abdul, S.; Firdani, R.-P.; Chung, H.-J.; Uddin, M.; Hur, M.; Park, J.H.; Kim, H.W.; Gradišek, A.; Dovgan, E. Artificial intelligence-based models for screening of hematologic malignancies using cell population data. Sci. Rep. 2020, 10, 4583. [Google Scholar] [CrossRef]
- Chai, S.Y.; Hayat, A.; Flaherty, G.T. Integrating artificial intelligence into hematology training and practice: Opportunities, threats and proposed solutions. Br. J. Haematol. 2022, 198, 807–811. [Google Scholar] [CrossRef]
- Radakovich, N.; Nagy, M.; Nazha, A. Artificial intelligence in hematology: Current challenges and opportunities. Curr. Hematol. Malig. Rep. 2020, 15, 203–210. [Google Scholar] [CrossRef]
- Velten, B.; Stegle, O. Principles and challenges of modeling temporal and spatial omics data. Nat. Methods 2023, 20, 1462–1474. [Google Scholar] [CrossRef] [PubMed]
- Karathanasis, N.; Papasavva, P.L.; Oulas, A.; Spyrou, G.M. Combining clinical and molecular data for personalized treatment in acute myeloid leukemia: A machine learning approach. Comput. Methods Programs Biomed. 2024, 257, 108432. [Google Scholar] [CrossRef] [PubMed]
- Macheka, S.; Ng, P.Y.; Ginsburg, O.; Hope, A.; Sullivan, R.; Aggarwal, A. Prospective evaluation of artificial intelligence (AI) applications for use in cancer pathways following diagnosis: A systematic review. BMJ Oncol. 2024, 3, e000255. [Google Scholar] [CrossRef] [PubMed]
- Rösler, W.; Altenbuchinger, M.; Baeßler, B.; Beissbarth, T.; Beutel, G.; Bock, R.; von Bubnoff, N.; Eckardt, J.-N.; Foersch, S.; Loeffler, C.M.L.; et al. An overview and a roadmap for artificial intelligence in hematology and oncology. J. Cancer Res. Clin. Oncol. 2023, 149, 7997–8006. [Google Scholar] [CrossRef]
- Rösler, W.; Roiss, M.; Widmer, C. Advancements in Machine Learning (ML): Transforming the Future of Blood Cancer Detection and Outcome Prediction. Healthbook TIMES Oncol. Hematol. 2024, 20, 20–25. [Google Scholar] [CrossRef]
- Wörheide, M.A.; Krumsiek, J.; Kastenmüller, G.; Arnold, M. Multi-omics integration in biomedical research—A metabolomics-centric review. Anal. Chim. Acta 2021, 1141, 144–162. [Google Scholar] [CrossRef]
- Feldner-Busztin, D.; Nisantzis, P.F.; Edmunds, S.J.; Boza, G.; Racimo, F.; Gopalakrishnan, S.; Limborg, M.T.; Lahti, L.; de Polavieja, G.G.; Wren, J. Dealing with dimensionality: The application of machine learning to multi-omics data. Bioinformatics 2023, 39, btad021. [Google Scholar] [CrossRef]
- Zhang, Q.; Ding, K.; Lv, T.; Wang, X.; Yin, Q.; Zhang, Y.; Yu, J.; Wang, Y.; Li, X.; Xiang, Z.; et al. Scientific large language models: A survey on biological & chemical domains. ACM Comput. Surv. 2024, 57, 1–38. [Google Scholar] [CrossRef]
- Nanaa, A.; Akkus, Z.; Lee, W.Y.; Pantanowitz, L.; Salama, M.E. Machine learning and augmented human intelligence use in histomorphology for haematolymphoid disorders. Pathology 2021, 53, 400–407. [Google Scholar] [CrossRef]
- Hansen, J.; Jain, A.R.; Nenov, P.; Robinson, P.N.; Iyengar, R. From transcriptomics to digital twins of organ function. Front. Cell Dev. Biol. 2024, 12, 1240384. [Google Scholar] [CrossRef]
- Ajonu, C.I.; Grundy, R.I.; Ball, G.R.; Zafeiris, D. Application of a high-throughput swarm-based deep neural network Algorithm reveals SPAG5 downregulation as a potential therapeutic target in adult AML. Funct. Integr. Genomics 2025, 25, 8. [Google Scholar] [CrossRef] [PubMed]
- D’amico, S.; Dall’Olio, D.; Sala, C.; Dall’Olio, L.; Sauta, E.; Zampini, M.; Asti, G.; Lanino, L.; Maggioni, G.; Campagna, A.; et al. Synthetic data generation by artificial intelligence to accelerate research and precision medicine in hematology. JCO Clin. Cancer Inform. 2023, 7, e2300021. [Google Scholar] [CrossRef]
- Murtaza, H.; Ahmed, M.; Khan, N.F.; Murtaza, G.; Zafar, S.; Bano, A. Synthetic data generation: State of the art in health care domain. Comput. Sci. Rev. 2023, 48, 100546. [Google Scholar] [CrossRef]
- Passamonti, F.; Corrao, G.; Castellani, G.; Mora, B.; Maggioni, G.; Gale, R.P.; Della Porta, M.G. The future of research in hematology: Integration of conventional studies with real-world data and artificial intelligence. Blood Rev. 2022, 54, 100914. [Google Scholar] [CrossRef]
- Han, H.; Liu, X. The challenges of explainable AI in biomedical data science. BMC Bioinform. 2022, 22 (Suppl. 12), 443. [Google Scholar] [CrossRef]
- Marcus, E.; Teuwen, J. Artificial intelligence and explanation: How, why, and when to explain black boxes. Eur. J. Radiol. 2024, 173, 111393. [Google Scholar] [CrossRef]
Algorithm Type | Number of Studies | Percentage |
---|---|---|
Support Vector Machines | 28 | 31.5% |
Random Forests | 25 | 28.1% |
Deep Learning/Neural Networks | 24 | 27.0% |
Ensemble Methods | 18 | 20.2% |
Naive Bayes | 12 | 13.5% |
Decision Trees | 10 | 11.2% |
Logistic Regression | 8 | 9.0% |
Algorithm | n | Median AUC | 95% CI | Median Sensitivity | Median Specificity |
---|---|---|---|---|---|
Deep Learning | 24 | 0.91 | 0.89–0.95 | 88.7% | 87.2% |
Ensemble Methods | 18 | 0.89 | 0.85–0.93 | 86.4% | 85.1% |
SVM | 28 | 0.86 | 0.82–0.90 | 84.2% | 83.5% |
Random Forest | 25 | 0.85 | 0.81–0.89 | 82.8% | 81.9% |
Naive Bayes | 12 | 0.79 | 0.74–0.84 | 77.3% | 76.8% |
Validation Method | Number of Studies | Percentage |
---|---|---|
Monte Carlo Cross-validation | 73 | 82.0% |
Hold-out validation | 58 | 65.2% |
Bootstrap validation | 15 | 16.9% |
Nested cross-validation | 12 | 13.5% |
External Validation Type | Number of Studies | Percentage |
---|---|---|
Independent dataset validation | 31 | 34.8% |
Multi-center validation | 12 | 13.5% |
Temporal validation | 8 | 9.0% |
Cross-population validation | 5 | 5.6% |
Interpretation Method | Number of Studies | Percentage |
---|---|---|
Gene set enrichment analysis | 28 | 31.5% |
Protein interaction network analysis | 19 | 21.3% |
Metabolic pathway mapping | 12 | 13.5% |
Literature-based validation | 56 | 62.9% |
Experimental validation | 8 | 9.0% |
Explainability Method | Number of Studies | Percentage |
---|---|---|
Feature importance ranking | 45 | 50.6% |
SHAP (SHapley Additive exPlanations) | 12 | 13.5% |
Decision trees/rules extraction | 15 | 16.9% |
Attention mechanisms | 8 | 9.0% |
LIME (Local Interpretable Model-agnostic Explanations) | 7 | 7.9% |
Gradient-based methods | 6 | 6.7% |
Pathway analysis integration | 34 | 38.2% |
Algorithm Type | Intrinsically Interpretable | Post hoc Explanation Used | No Explanation |
---|---|---|---|
Decision Trees | 10 (100%) | 0 (0%) | 0 (0%) |
Logistic Regression | 8 (100%) | 0 (0%) | 0 (0%) |
Random Forest | 3 (12%) | 18 (72%) | 4 (16%) |
SVM | 2 (7%) | 15 (54%) | 11 (39%) |
Deep Learning | 0 (0%) | 8 (33%) | 16 (67%) |
Privacy Measure | Number of Studies | Percentage |
---|---|---|
Data anonymization reported | 67 | 75.3% |
Informed consent obtained | 78 | 87.6% |
Institutional review board approval | 81 | 91.0% |
Data sharing agreements | 23 | 25.8% |
GDPR compliance mentioned | 15 | 16.9% |
Bias Mitigation Strategy | Number of Studies | Percentage |
---|---|---|
Population diversity assessed | 23 | 25.8% |
Bias detection methods used | 15 | 16.9% |
Fairness metrics reported | 8 | 9.0% |
Stratified validation by demographics | 12 | 13.5% |
Batch effect correction | 34 | 38.2% |
Clinical Utility Measure | Number of Studies | Percentage |
---|---|---|
Clinical decision curve analysis | 12 | 13.5% |
Net benefit analysis | 8 | 9.0% |
Cost-effectiveness analysis | 3 | 3.4% |
Clinical impact study | 5 | 5.6% |
Physician preference study | 2 | 2.2% |
Implementation Aspect | Number of Studies | Percentage |
---|---|---|
Computational requirements discussed | 34 | 38.2% |
Integration with clinical workflows | 18 | 20.2% |
User interface development | 8 | 9.0% |
Training requirements for clinicians | 6 | 6.7% |
Maintenance and updating protocols | 4 | 4.5% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alhumrani, S.Q.; Ball, G.R.; El-Sherif, A.A.; Ahmed, S.; Mousa, N.O.; Alghorayed, S.A.; Alatawi, N.A.; Ali, A.M.; Alqahtani, F.A.; Gabre, R.M. Machine Learning for Multi-Omics Characterization of Blood Cancers: A Systematic Review. Cells 2025, 14, 1385. https://doi.org/10.3390/cells14171385
Alhumrani SQ, Ball GR, El-Sherif AA, Ahmed S, Mousa NO, Alghorayed SA, Alatawi NA, Ali AM, Alqahtani FA, Gabre RM. Machine Learning for Multi-Omics Characterization of Blood Cancers: A Systematic Review. Cells. 2025; 14(17):1385. https://doi.org/10.3390/cells14171385
Chicago/Turabian StyleAlhumrani, Sultan Qalit, Graham Roy Ball, Ahmed A. El-Sherif, Shaza Ahmed, Nahla O. Mousa, Shahad Ali Alghorayed, Nader Atallah Alatawi, Albalawi Mohammed Ali, Fahad Abdullah Alqahtani, and Refaat M. Gabre. 2025. "Machine Learning for Multi-Omics Characterization of Blood Cancers: A Systematic Review" Cells 14, no. 17: 1385. https://doi.org/10.3390/cells14171385
APA StyleAlhumrani, S. Q., Ball, G. R., El-Sherif, A. A., Ahmed, S., Mousa, N. O., Alghorayed, S. A., Alatawi, N. A., Ali, A. M., Alqahtani, F. A., & Gabre, R. M. (2025). Machine Learning for Multi-Omics Characterization of Blood Cancers: A Systematic Review. Cells, 14(17), 1385. https://doi.org/10.3390/cells14171385