Explainable AI-Based Semantic Retrieval from an Expert-Curated Oncology Knowledge Graph for Clinical Decision Support
Abstract
1. Introduction
Limitations of Current Research
2. Background
2.1. Prognosis and Predictive Modelling
2.2. Treatment Planning and Decision Support
2.3. Natural Language Processing (NLP) in Oncology
- Data Quality and Accessibility. AI models are only as good as the data on which they are trained. The issues of data scarcity, heterogeneity between institutions, and patient privacy remain significant hurdles.
- Interpretability and Trust. Many advanced deep learning models function as black boxes, making it difficult for clinicians to understand their reasoning. Research into explainable AI (XAI) is critical to building trust and facilitating clinical adoption.
- Validation and Regulation. Models must be rigorously and prospectively validated in real-world clinical settings before they can be deployed. A clear regulatory framework for the approval and monitoring of these AI-based medical devices is essential.
- Integration into Clinical Workflows. For AI tools to be effective, they must be seamlessly integrated into existing clinical workflows without disrupting the established practices of healthcare professionals.
2.4. Ontologies and Knowledge Graphs in Oncology
3. Data and Platform
3.1. OncoProAI Platform
3.2. Dataset Description
3.3. Ethics and Data Access
4. Methods
4.1. Node-to-Text Serialisation
4.2. Embedding Models and Vector Search
4.3. Graph Retrieval-Augmented Scoring
4.4. Explainability and Result Grounding
4.5. Evaluation Framework and Human-in-the-Loop Curation
4.6. Ground Truth Creation and Model-Agnostic Evaluation
- Pooling of results: For each query, the top-k results from all six embedding models are pooled together to form a comprehensive set of candidate nodes. This pooling strategy is designed to mitigate any model-specific biases and to ensure that the final ground truth is as comprehensive as possible.
- Relevance labelling: The curated rankings provided by the human experts are used to bootstrap an initial set of relevance labels. These are later refined through a more explicit labelling process, where experts assign binary or graded relevance scores to each node in the pooled set. The resulting qrels are versioned and maintained independently of the models being evaluated, which allows for a fair and unbiased comparison of different retrieval models.
4.7. Annotation User Interface and Curation Workflow
- (i)
- Mark duplicate – Removes items from ordered_nodes and sets isDuplicate = true in nodes.
- (ii)
- Mark irrelevant – Maintains item visibility while flagging isIrrelevant = true and assigning a relevance of in ground truth.
- (iii)
- Add node – Enables typeahead search over the knowledge graph; added items carry original_index = −1 and isManuallyAdded = true for provenance tracking.
- Result Pooling, where the top-k results from all six embedding models are combined to form comprehensive candidate sets, mitigating model-specific biases.
- Relevance Labelling, where expert-curated rankings bootstrap initial relevance labels, subsequently refined through explicit scoring where experts assign binary or graded relevance scores to each pooled node.
5. Evaluation
5.1. Query Set and Provenance
5.2. Ground Truth and Annotation Framework
5.3. Metrics and Statistical Testing
5.4. Implementation and Hardware Specifications
6. Results
6.1. Overall Retrieval Performance
- High-performing multilingual embeddings can reliably surface the most clinically valuable nodes at the top of the rankings under graded relevance;
- The model choice affects downstream utility in our oncology KG setting.
Visual Summary
6.2. Statistical Significance
6.3. Interpretation
6.4. Performance Analysis and Current Limitations
7. Conclusions
7.1. Study Limitations
7.2. Future Work
- Hierarchical clustering of embeddings with coarse-to-fine search, leveraging the natural taxonomic structure of medical knowledge to create semantically coherent clusters (e.g., grouping by organ system, disease type, or treatment modality).
- Learned sparse representations using techniques like SPLADE or neural sparse retrieval to enable more efficient similarity computation while preserving domain-specific medical semantics.
- Graph-aware indexing that exploits the knowledge graph’s inherent structure to guide ANN index construction, potentially using techniques like navigable small world graphs that respect medical concept hierarchies.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. User Interface and Platform Screenshots
Appendix A.1. Annotation Interface
Appendix A.2. OncoProAI Platform Components
References
- Shimizu, H.; Nakayama, K.I. Artificial intelligence in oncology. Cancer Sci. 2020, 111, 1452–1460. [Google Scholar] [CrossRef]
- Rudie, J.D.; Rauschecker, A.M.; Bryan, R.N.; Davatzikos, C.; Mohan, S. Emerging applications of artificial intelligence in neuro-oncology. Radiology 2019, 290, 607–618. [Google Scholar] [CrossRef]
- Hamamoto, R.; Suvarna, K.; Yamada, M.; Kobayashi, K.; Shinkai, N.; Miyake, M.; Takahashi, M.; Jinnai, S.; Shimoyama, R.; Sakai, A.; et al. Application of artificial intelligence technology in oncology: Towards the establishment of precision medicine. Cancers 2020, 12, 3532. [Google Scholar] [CrossRef]
- Aftab, M.; Mehmood, F.; Zhang, C.; Nadeem, A.; Dong, Z.; Jiang, Y.; Liu, K. AI in Oncology: Transforming Cancer Detection through Machine Learning and Deep Learning Applications. arXiv 2025, arXiv:2501.15489. [Google Scholar] [CrossRef]
- Madabhushi, A. Image analysis and machine learning in digital pathology: Challenges and opportunities. Med. Image Anal. 2016, 33, 170–175. [Google Scholar] [CrossRef] [PubMed]
- Collins, C.; Baker, S.; Brown, J.; Zheng, H.; Chan, A.; Stenius, U.; Narita, M.; Korhonen, A. Text mining for contexts and relationships in cancer genomics literature. Bioinformatics 2025, 40, btae021. [Google Scholar] [CrossRef] [PubMed]
- Shrestha, M.; Mandal, B.; Mandal, V.; Shrestha, A.; Shrestha, A.B. In-Context Learning for Label-Efficient Cancer Image Classification in Oncology. Inform. Med. Unlocked 2025, 58, 101683. [Google Scholar] [CrossRef]
- Elucidata. Transformative Trends: Manual and Automated Curation Approaches in Biopharma Research. 2024. Available online: https://www.elucidata.io/blog/manual-and-automated-curation-approaches-in-biopharma-research (accessed on 16 August 2025).
- Hu, Y.; Lei, Z.; Zhang, Z.; Pan, B.; Ling, C.; Zhao, L. GRAG: Graph Retrieval-Augmented Generation. arXiv 2025, arXiv:2405.16506. [Google Scholar]
- Nicholson, N.; Giusti, F.; Martos, C. Ontology-based AI design patterns and constraints in cancer registry data validation. Cancers 2023, 15, 5812. [Google Scholar] [CrossRef]
- Luschi, A.; Petraccone, C.; Fico, G.; Pecchia, L.; Iadanza, E. Semantic Ontologies for Complex Healthcare Structures: A Scoping Review. IEEE Access 2023, 11, 19228–19246. [Google Scholar] [CrossRef]
- Corbucci, L.; Monreale, A.; Panigutti, C.; Natilli, M.; Smiraglio, S.; Pedreschi, D. Semantic Enrichment of Explanations of AI Models for Healthcare. In Proceedings of the Discovery Science: 26th International Conference, DS 2023, Porto, Portugal, 9–11 October 2023; Proceedings. Springer: Berlin/Heidelberg, Germany, 2023; pp. 216–229. [Google Scholar] [CrossRef]
- OncoProAI. OncoProAI Website. Available online: https://www.oncoproai.com (accessed on 16 August 2025).
- Kann, B.H.; Thompson, R.; Thomas, C.R., Jr.; Dicker, A.; Aneja, S. Artificial intelligence in oncology: Current applications and future directions. Oncology 2019, 33, 46–53. [Google Scholar]
- Abdel Razek, A.A.K.; Alksas, A.; Shehata, M.; AbdelKhalek, A.; Abdel Baky, K.; El-Baz, A.; Helmy, E. Clinical applications of artificial intelligence and radiomics in neuro-oncology imaging. Insights Imaging 2021, 12, 152. [Google Scholar] [CrossRef] [PubMed]
- Farina, E.; Nabhen, J.J.; Dacoregio, M.I.; Batalini, F.; Moraes, F.Y. An overview of artificial intelligence in oncology. Future Sci. OA 2022, 8, FSO78. [Google Scholar] [CrossRef]
- Vicini, S.; Bortolotto, C.; Rengo, M.; Ballerini, D.; Bellini, D.; Carbone, I.; Preda, L.; Laghi, A.; Coppola, F.; Faggioni, L. A narrative review on current imaging applications of artificial intelligence and radiomics in oncology: Focus on the three most common cancers. Radiol. Medica 2022, 127, 819–836. [Google Scholar] [CrossRef]
- Alsharif, F. Artificial Intelligence in Oncology: Applications, Challenges and Future Frontiers. Int. J. Pharm. Investig. 2024, 14, 647–656. [Google Scholar] [CrossRef]
- Matsui, Y.; Ueda, D.; Fujita, S.; Fushimi, Y.; Tsuboyama, T.; Kamagata, K.; Ito, R.; Yanagawa, M.; Yamada, A.; Kawamura, M.; et al. Applications of artificial intelligence in interventional oncology: An up-to-date review of the literature. Jpn. J. Radiol. 2025, 43, 164–176. [Google Scholar] [CrossRef]
- Li, C.; Zhang, Y.; Weng, Y.; Wang, B.; Li, Z. Natural language processing applications for computer-aided diagnosis in oncology. Diagnostics 2023, 13, 286. [Google Scholar] [CrossRef]
- Silva, M.C.; Eugénio, P.; Faria, D.; Pesquita, C. Ontologies and knowledge graphs in oncology research. Cancers 2022, 14, 1906. [Google Scholar] [CrossRef] [PubMed]
- SNOMED International. SNOMED CT International Edition. 2024. Available online: https://www.snomed.org/ (accessed on 14 August 2025).
- The Gene Ontology Consortium. The Gene Ontology resource: 20 years and still GOing strong. Nucleic Acids Res. 2019, 47, D330–D338. [Google Scholar] [CrossRef] [PubMed]
- Westbury, S.K.; Turro, E.; Greene, D.; Lentaigne, C.; Kelly, A.M.; Bariana, T.K.; Simeoni, I.; Pillois, X.; Attwood, A.; Austin, S.; et al. Human phenotype ontology annotation and cluster analysis to unravel genetic defects in 707 cases with unexplained bleeding and platelet disorders. Genome Med. 2015, 7, 36. [Google Scholar] [CrossRef]
- Ritchie, J.B.; Frey, L.J.; Lamy, J.B.; Bellcross, C.; Morrison, H.; Schiffman, J.D.; Welch, B.M. Automated clinical practice guideline recommendations for hereditary cancer risk using chatbots and ontologies: System description. JMIR Cancer 2022, 8, e29289. [Google Scholar] [CrossRef] [PubMed]
- Liao, J.; Li, X.; Gan, Y.; Han, S.; Rong, P.; Wang, W.; Li, W.; Zhou, L. Artificial intelligence assists precision medicine in cancer treatment. Front. Oncol. 2023, 12, 998222. [Google Scholar] [CrossRef] [PubMed]
- Bhat, S. Models Used in the Work: oncology-kg-ai (GitHub Repository). Available online: https://github.com/SameerBhat/oncology-kg-ai (accessed on 16 August 2025).
Component | Count |
---|---|
Total nodes | ∼200,000 |
Haematological non-malignant disease groups | 16 |
Malignant haematological disease groups | 41 |
Solid tumour disease groups | 49 |
Clinical pathways | 250,000+ |
Supporting scientific studies | 40,000+ |
Active pharmaceutical ingredients | 400+ |
Educational documents | 700+ |
Data Sources | |
German guidelines (Onkopedia, AWMF, AGO) | Primary |
European guidelines (ESMO) | Secondary |
American guidelines (NCCN) | Secondary |
Example Node Types | |
Disease entities | Diagnosis codes, staging |
Treatment protocols | Drug combinations, dosages |
Biomarkers | Genetic mutations, protein expressions |
Contraindications | Drug interactions, comorbidities |
Snapshot Information | |
Knowledge graph version | 2024.Q4 |
Last update cycle | 14-day rolling |
Model Name | Dim | Multilingual | Tokenizer/Notes | Size (GB) |
---|---|---|---|---|
BAAI/bge-m3 (M3-Embedding) | 1024 | Yes | XLM-RoBERTa-based; dense/sparse/multi-vector; long-context | 2.3 |
Alibaba-NLP/gte-base-en-v1.5 | 768 | No | Transformer++ (BERT + RoPE + GLU); long-context | 1.99 |
jina-embeddings-v4 (Jina 4) | 2048 | Yes | Unified multi-modal/multilingual model; dense (2048 dim, truncatable to 128) | 5.06 |
all-mpnet-base-v2 | 768 | No | WordPiece; contrastive fine-tuned dense embeddings | 1.99 |
nomic-embed-text-v2-moe | 768 | Yes | MoE (8 experts, top-2), Matryoshka reduction | 1.99 |
Qwen3-Embedding-4B | 2560 | Yes | BPE; multilingual + MRL (32–2560D adjustability) | 6.31 |
Model | #Q | P@1 | P@3 | P@5 | P@10 |
---|---|---|---|---|---|
bgem3 | 100 | 0.857 [0.811–0.911] | 0.762 [0.704–0.817] | 0.571 [0.516–0.626] | 0.320 [0.258–0.385] |
gte | 100 | 0.857 [0.805–0.922] | 0.686 [0.638–0.740] | 0.457 [0.397–0.522] | 0.280 [0.223–0.338] |
jina4 | 100 | 0.857 [0.795–0.924] | 0.667 [0.612–0.716] | 0.480 [0.414–0.537] | 0.263 [0.201–0.323] |
mpnetbase2 | 100 | 0.171 [0.116–0.217] | 0.057 [0.000–0.112] | 0.035 [0.000–0.091] | 0.035 [0.000–0.086] |
nomicv2 | 100 | 0.979 [0.929–1.000] | 0.978 [0.925–1.000] | 0.663 [0.608–0.716] | 0.360 [0.303–0.405] |
qwen34b | 100 | 0.971 [0.925–1.000] | 0.838 [0.787–0.890] | 0.651 [0.598–0.702] | 0.366 [0.305–0.420] |
Model | #Q | Recall@1 | Recall@3 | Recall@5 | Recall@10 |
---|---|---|---|---|---|
bgem3 | 100 | 0.169 [0.122–0.228] | 0.415 [0.355–0.478] | 0.525 [0.470–0.581] | 0.592 [0.536–0.648] |
gte | 100 | 0.169 [0.113–0.229] | 0.361 [0.301–0.412] | 0.399 [0.335–0.449] | 0.570 [0.522–0.627] |
jina4 | 100 | 0.195 [0.141–0.250] | 0.435 [0.377–0.492] | 0.494 [0.433–0.547] | 0.519 [0.468–0.573] |
mpnetbase2 | 100 | 0.023 [0.000–0.075] | 0.023 [0.000–0.071] | 0.024 [0.000–0.075] | 0.037 [0.002–0.087] |
nomicv2 | 100 | 0.287 [0.231–0.338] | 0.836 [0.781–0.883] | 0.849 [0.786–0.910] | 0.882 [0.830–0.946] |
qwen34b | 100 | 0.210 [0.146–0.264] | 0.486 [0.430–0.540] | 0.625 [0.569–0.678] | 0.699 [0.644–0.770] |
Model | #Q | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 | MRR | MAP |
---|---|---|---|---|---|---|---|
bgem3 | 100 | 0.929 [0.857–0.988] | 0.889 [0.842–0.938] | 0.927 [0.864–0.984] | 0.975 [0.929–1.000] | 0.942 [0.888–1.000] | 0.912 [0.859–0.973] |
gte | 100 | 0.929 [0.874–0.983] | 0.956 [0.910–0.998] | 0.959 [0.914–1.000] | 0.979 [0.929–1.000] | 0.971 [0.913–1.000] | 0.972 [0.922–1.000] |
jina4 | 100 | 0.978 [0.925–1.000] | 0.979 [0.928–1.000] | 0.978 [0.927–1.000] | 0.979 [0.922–1.000] | 0.972 [0.925–1.000] | 0.980 [0.933–0.999] |
mpnetbase2 | 100 | 0.526 [0.464–0.574] | 0.526 [0.469–0.588] | 0.504 [0.442–0.563] | 0.503 [0.441–0.574] | 0.188 [0.116–0.239] | 0.201 [0.157–0.250] |
nomicv2 | 100 | 0.979 [0.927–1.000] | 0.979 [0.931–1.000] | 0.979 [0.929–1.000] | 0.979 [0.921–1.000] | 0.980 [0.921–1.000] | 0.978 [0.930–1.000] |
qwen34b | 100 | 0.970 [0.930–1.000] | 0.970 [0.912–1.000] | 0.970 [0.923–1.000] | 0.969 [0.918–1.000] | 0.978 [0.939–1.000] | 0.979 [0.922–1.000] |
Model A | Model B | p | q | Significant at FDR 0.05? |
---|---|---|---|---|
bgem3 | mpnetbase2 | 0.000000 | 0.000000 | Yes |
gte | mpnetbase2 | 0.000000 | 0.000000 | Yes |
jina4 | mpnetbase2 | 0.000000 | 0.000000 | Yes |
mpnetbase2 | nomicv2 | 0.000000 | 0.000000 | Yes |
mpnetbase2 | qwen34b | 0.000000 | 0.000000 | Yes |
gte | qwen34b | 0.010432 | 0.026079 | Yes |
jina4 | qwen34b | 0.013442 | 0.028804 | Yes |
nomicv2 | qwen34b | 0.016160 | 0.030300 | Yes |
bgem3 | qwen34b | 0.030735 | 0.051225 | No |
bgem3 | gte | 0.226617 | 0.339926 | No |
bgem3 | nomicv2 | 0.552851 | 0.753887 | No |
gte | jina4 | 0.714249 | 0.798198 | No |
bgem3 | jina4 | 0.720351 | 0.798198 | No |
gte | nomicv2 | 0.744985 | 0.798198 | No |
jina4 | nomicv2 | 0.986493 | 0.986493 | No |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mushtaq, S.; Trovati, M.; Bessis, N. Explainable AI-Based Semantic Retrieval from an Expert-Curated Oncology Knowledge Graph for Clinical Decision Support. Future Internet 2025, 17, 471. https://doi.org/10.3390/fi17100471
Mushtaq S, Trovati M, Bessis N. Explainable AI-Based Semantic Retrieval from an Expert-Curated Oncology Knowledge Graph for Clinical Decision Support. Future Internet. 2025; 17(10):471. https://doi.org/10.3390/fi17100471
Chicago/Turabian StyleMushtaq, Sameer, Marcello Trovati, and Nik Bessis. 2025. "Explainable AI-Based Semantic Retrieval from an Expert-Curated Oncology Knowledge Graph for Clinical Decision Support" Future Internet 17, no. 10: 471. https://doi.org/10.3390/fi17100471
APA StyleMushtaq, S., Trovati, M., & Bessis, N. (2025). Explainable AI-Based Semantic Retrieval from an Expert-Curated Oncology Knowledge Graph for Clinical Decision Support. Future Internet, 17(10), 471. https://doi.org/10.3390/fi17100471