Unlocking the Potential of AI and Big Data in Cancer Research: Advances and Applications

Mastroleo, Federico; Marvaso, Giulia

doi:10.3390/cancers18081234

Open AccessEditorial

Unlocking the Potential of AI and Big Data in Cancer Research: Advances and Applications

by

Federico Mastroleo

^1,2,*

and

Giulia Marvaso

^1,2,*

¹

Division of Radiation Oncology, IEO European Institute of Oncology IRCCS, 20141 Milan, Italy

²

Department of Oncology and Hemato-Oncology, University of Milan, 20122 Milan, Italy

^*

Authors to whom correspondence should be addressed.

Cancers 2026, 18(8), 1234; https://doi.org/10.3390/cancers18081234

Submission received: 10 April 2026 / Accepted: 13 April 2026 / Published: 14 April 2026

(This article belongs to the Special Issue Unlocking the Potential of AI and Big Data in Cancer Research: Advances and Applications)

Download Versions Notes

Artificial intelligence (AI) and big data analytics have fundamentally reshaped the landscape of modern oncology. Over the past decade, the exponential growth of digitized clinical data (genomic sequencing, radiomics, digital pathology, electronic health records [EHRs]) has created unprecedented opportunities to apply machine learning (ML) and deep learning (DL) algorithms to problems that were previously intractable through conventional statistical approaches [1,2]. In radiation oncology, AI-driven autosegmentation of organs at risk and target volumes has emerged as one of the most mature clinical applications, with the potential to reduce inter-observer variability and planning time substantially [3]. Radiomics, the high-throughput extraction of quantitative imaging features, has gained traction as a non-invasive complement to tissue-based biomarkers, supporting outcome prediction models across multiple tumor sites [4]. In parallel, natural language processing (NLP) and large language models (LLMs) are beginning to unlock the vast unstructured textual information embedded in clinical notes, pathology reports, and the published literature, offering new avenues for automated data curation and clinical decision support [5,6].

1. Knowledge Gaps and How This Special Issue Addresses Them

Despite the rapid pace of methodological innovation, several critical knowledge gaps persist. First, the translational gap between proof-of-concept AI studies and clinically validated, prospectively tested tools remains substantial. Most published models have been developed and evaluated on retrospective, single-institution cohorts, with limited external validation and insufficient attention to generalizability across patient populations, imaging protocols, and institutional workflows [7].

Second, the interpretability and explainability of AI models continue to pose challenges for clinical adoption. While performance metrics such as area under the curve (AUC) are commonly reported, the mechanisms by which models arrive at their predictions are often opaque, limiting clinician trust and regulatory acceptance [8]. The recently published TRIPOD+AI statement [9] and the TRIPOD-LLM reporting guideline [10] represent important steps toward standardized, transparent reporting of prediction model studies, but their adoption in oncological research remains inconsistent.

Third, integrating AI tools into existing clinical information systems, including EHRs and treatment planning systems, presents substantial informatics and interoperability challenges. Real-time deployment of AI models requires not only technical infrastructure (e.g., FHIR-based interfaces, cloud or edge computing) but also robust governance frameworks to ensure data quality, patient privacy, and algorithmic fairness [11].

The contributions assembled in this Special Issue collectively address several of these gaps. The published articles span a wide thematic range, including deep learning for automated segmentation in radiotherapy planning, radiomics-based outcome prediction, bibliometric analyses of rapidly evolving oncological subfields, multi-omics data integration, and the application of AI to specific diagnostic and prognostic tasks. By bringing together original research and comprehensive reviews, this collection provides both methodological advances and critical evaluations of the state of the art, offering readers a cross-sectional view of how AI and big data are being translated into oncological practice.

2. Future Research Directions

Looking forward, several priority areas warrant focused investigation.

2.1. Prospective Validation and Clinical Implementation

The field must transition from retrospective model development to prospective, multi-institutional validation studies embedded within clinical trials or real-world deployment frameworks. Pragmatic trial designs, including stepped-wedge, cluster-randomized, and adaptive platform trials, are well-suited to evaluating AI-augmented workflows in oncology. Regulatory science for AI-based medical devices is evolving rapidly, and collaborative engagement between researchers, clinicians, and regulatory agencies will be essential [12,13].

2.2. Foundation Models and Large Language Models

The emergence of multimodal foundation models, pretrained on large-scale imaging, genomic, and textual corpora, represents a paradigm shift in how AI systems can be adapted to oncological tasks [14]. Transfer learning, few-shot learning, and prompt engineering may enable the development of high-performing models even in data-scarce clinical scenarios, such as rare tumor types or pediatric malignancies. The integration of LLMs with retrieval-augmented generation (RAG) architectures is particularly promising for clinical decision support systems that draw on institutional knowledge bases and current evidence.

2.3. Multi-Omics Integration and Digital Twins

The fusion of genomic, transcriptomic, proteomic, radiomic, and clinical data into unified predictive frameworks is an area of active development [15]. Longitudinal, multi-omics datasets that capture the evolution of disease biology and treatment response over time will be critical for building dynamic, patient-specific models, so-called digital twins, capable of informing adaptive treatment strategies. In silico clinical trials leveraging synthetic cohorts derived from such models may accelerate the evaluation of novel therapeutic approaches.

2.4. Federated Learning and Data Governance

Overcoming the barriers to multi-institutional data sharing without compromising patient privacy is a prerequisite for building generalizable AI models in oncology. Federated learning, differential privacy, and synthetic data generation are emerging as viable technical solutions [16], but their implementation at scale requires coordinated governance frameworks, standardized data models (e.g., OMOP-CDM, FHIR), and institutional commitment to interoperability.

2.5. Equity, Bias, and Ethical AI

As AI tools become increasingly embedded in clinical decision-making, ensuring that they perform equitably across diverse patient populations is paramount. Algorithmic auditing, fairness-aware model training, and the inclusion of underrepresented populations in training datasets are critical safeguards against perpetuating or amplifying existing health disparities [17].

3. Conclusions

This Special Issue illustrates both the breadth and the depth of current research at the intersection of AI, big data, and cancer. The contributions highlight meaningful progress in autosegmentation, outcome modeling, bibliometric characterization of emerging fields, and the integration of multi-dimensional data for clinical decision support. At the same time, they underscore the work that remains: prospective validation, clinical implementation, regulatory alignment, and equitable deployment.

We are grateful to all the authors who contributed their work to this Special Issue, to the reviewers who ensured the rigor of the published articles, and to the editorial team at Cancers for their support throughout the process. We hope that this collection will serve as both a reference for the current state of the field and a stimulus for the next generation of studies that bring AI-driven cancer research closer to clinical impact.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Rajpurkar, P.; Chen, E.; Banerjee, O.; Topol, E.J. AI in health and medicine. Nat. Med. 2022, 28, 31–38. [Google Scholar] [CrossRef]
Topol, E.J. High-performance medicine: The convergence of human and artificial intelligence. Nat. Med. 2019, 25, 44–56. [Google Scholar] [CrossRef] [PubMed]
Isaksson, L.J.; Mastroleo, F.; Vincini, M.G.; Marvaso, G.; Zaffaroni, M.; Gola, M.; Mazzola, G.C.; Bergamaschi, L.; Gaito, S.; Alongi, F.; et al. The emerging role of Artificial Intelligence in proton therapy: A review. Crit. Rev. Oncol./Hematol. 2024, 204, 104485. [Google Scholar] [CrossRef] [PubMed]
Lambin, P.; Leijenaar, R.T.H.; Deist, T.M.; Peerlings, J.; De Jong, E.E.C.; Van Timmeren, J.; Sanduleanu, S.; Larue, R.T.H.M.; Even, A.J.G.; Jochems, A.; et al. Radiomics: The bridge between medical imaging and personalized medicine. Nat. Rev. Clin. Oncol. 2017, 14, 749–762. [Google Scholar] [CrossRef] [PubMed]
Singhal, K.; Azizi, S.; Tu, T.; Mahdavi, S.S.; Wei, J.; Chung, H.W.; Scales, N.; Tanwani, A.; Cole-Lewis, H.; Pfohl, S.; et al. Large language models encode clinical knowledge. Nature 2023, 620, 172–180. [Google Scholar] [CrossRef] [PubMed]
Thirunavukarasu, A.J.; Ting, D.S.J.; Elangovan, K.; Gutierrez, L.; Tan, T.F.; Ting, D.S.W. Large language models in medicine. Nat. Med. 2023, 29, 1930–1940. [Google Scholar] [CrossRef] [PubMed]
Park, S.H.; Han, K. Methodologic Guide for Evaluating Clinical Performance and Effect of Artificial Intelligence Technology for Medical Diagnosis and Prediction. Radiology 2018, 286, 800–809. [Google Scholar] [CrossRef] [PubMed]
Papanastasopoulos, Z.; Samala, R.K.; Chan, H.P.; Hadjiiski, L.; Paramagul, C.; Helvie, M.A.; Neal, C.H. Explainable AI for medical imaging: Deep-learning CNN ensemble for classification of estrogen receptor status from breast MRI. In Proceedings of the Medical Imaging 2020: Computer-Aided Diagnosis; Hahn, H.K., Mazurowski, M.A., Eds.; SPIE: Houston, TX, USA, 2020; p. 52. [Google Scholar]
Collins, G.S.; Moons, K.G.M.; Dhiman, P.; Riley, R.D.; Beam, A.L.; Van Calster, B.; Ghassemi, M.; Liu, X.; Reitsma, J.B.; Van Smeden, M.; et al. TRIPOD+AI statement: Updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 2024, 385, e078378. [Google Scholar] [CrossRef] [PubMed]
Gallifant, J.; Afshar, M.; Ameen, S.; Aphinyanaphongs, Y.; Chen, S.; Cacciamani, G.; Demner-Fushman, D.; Dligach, D.; Daneshjou, R.; Fernandes, C.; et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nat. Med. 2025, 31, 60–69. [Google Scholar] [CrossRef] [PubMed]
Char, D.S.; Shah, N.H.; Magnus, D. Implementing Machine Learning in Health Care—Addressing Ethical Challenges. N. Engl. J. Med. 2018, 378, 981–983. [Google Scholar] [CrossRef] [PubMed]
Muehlematter, U.J.; Daniore, P.; Vokinger, K.N. Approval of artificial intelligence and machine learning-based medical devices in the USA and Europe (2015–20): A comparative analysis. Lancet Digit. Health 2021, 3, e195–e203. [Google Scholar] [CrossRef] [PubMed]
Mastroleo, F.; Marvaso, G.; Jereczek-Fossa, B.A. Artificial intelligence in muscle-invasive bladder cancer: Opportunities, challenges, and clinical impact. Curr. Opin. Urol. 2025, 35, 543–548. [Google Scholar] [CrossRef] [PubMed]
Moor, M.; Banerjee, O.; Abad, Z.S.H.; Krumholz, H.M.; Leskovec, J.; Topol, E.J.; Rajpurkar, P. Foundation models for generalist medical artificial intelligence. Nature 2023, 616, 259–265. [Google Scholar] [CrossRef] [PubMed]
Acosta, J.N.; Falcone, G.J.; Rajpurkar, P.; Topol, E.J. Multimodal biomedical AI. Nat. Med. 2022, 28, 1773–1784. [Google Scholar] [CrossRef] [PubMed]
Rieke, N.; Hancox, J.; Li, W.; Milletarì, F.; Roth, H.R.; Albarqouni, S.; Bakas, S.; Galtier, M.N.; Landman, B.A.; Maier-Hein, K.; et al. The future of digital health with federated learning. npj Digit. Med. 2020, 3, 119. [Google Scholar] [CrossRef] [PubMed]
Obermeyer, Z.; Powers, B.; Vogeli, C.; Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 2019, 366, 447–453. [Google Scholar] [CrossRef] [PubMed]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mastroleo, F.; Marvaso, G. Unlocking the Potential of AI and Big Data in Cancer Research: Advances and Applications. Cancers 2026, 18, 1234. https://doi.org/10.3390/cancers18081234

AMA Style

Mastroleo F, Marvaso G. Unlocking the Potential of AI and Big Data in Cancer Research: Advances and Applications. Cancers. 2026; 18(8):1234. https://doi.org/10.3390/cancers18081234

Chicago/Turabian Style

Mastroleo, Federico, and Giulia Marvaso. 2026. "Unlocking the Potential of AI and Big Data in Cancer Research: Advances and Applications" Cancers 18, no. 8: 1234. https://doi.org/10.3390/cancers18081234

APA Style

Mastroleo, F., & Marvaso, G. (2026). Unlocking the Potential of AI and Big Data in Cancer Research: Advances and Applications. Cancers, 18(8), 1234. https://doi.org/10.3390/cancers18081234

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Unlocking the Potential of AI and Big Data in Cancer Research: Advances and Applications

1. Knowledge Gaps and How This Special Issue Addresses Them

2. Future Research Directions

2.1. Prospective Validation and Clinical Implementation

2.2. Foundation Models and Large Language Models

2.3. Multi-Omics Integration and Digital Twins

2.4. Federated Learning and Data Governance

2.5. Equity, Bias, and Ethical AI

3. Conclusions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI