An External Validation Study on Two Pre-Trained Large Language Models for Multimodal Prognostication in Laryngeal and Hypopharyngeal Cancer: Integrating Clinical, Treatment, and Radiomic Data to Predict Survival Outcomes with Interpretable Reasoning
Abstract
1. Introduction
2. Materials and Methods
2.1. Patient Selection
2.2. Image Acquisition and Preprocessing
2.3. Radiomic Feature Extraction
2.4. LLM Selection
2.5. Prompt Design for LLM Input
2.6. Statistical Analysis
3. Results
3.1. Patients’ Characteristics
3.2. Sensitivity, Specificity, and AUC
3.3. McNemar Test
3.4. Kaplan–Meier Survival Analysis
3.5. Qualitative Explanatory Reasoning
- (1)
- Combined with pre-RT intratumoral radiomic features to predict the outcome of death(−), recurrence(−), and metastasis(−):
- “1. Tumor Staging and Burden: The absence of nodal disease (N0) significantly improves the prognosis. The reported initial tumor volume of 3.76 (units likely cubic centimeters) is relatively small for a T3 lesion.
- 2. Clinical Risk Factors: At 50 years old, the patient is relatively younger typically with better overall performance status. Multiple phase-based dosing (e.g., higher-dose to the gross tumor volume, followed by lower-dose to subclinical regions) helps target both the primary tumor site and potential microscopic disease, improving cure rates.
- 3. Pre-RT Intratumoral Radiomics Indicators: The reported Sphericity value (0.63405) suggests a relatively regular tumor shape, as extremely low sphericity often indicates more irregular or invasive growth. A more compact tumor can sometimes respond more uniformly to radiation. Elongation (0.839707) and moderate Flatness (0.424511) imply the tumor’s geometry is not extremely distorted; this may correlate with a more localized extension rather than multifocal spread. The Uniformity (0.253221) here is relatively moderate, suggesting the tumor’s intensity distribution might not be excessively chaotic.”
- (2)
- Combined with mid-RT peritumoral radiomic features to predict the outcome of death(+), recurrence(+), and metastasis(+):
- “1. Tumor Staging and Burden: Advanced Stage T4a, N2 generally exhibit higher recurrence and metastasis rates. After three weeks of radiotherapy, the volume only reduced to 77.38, indicating limited response to treatment. Large tumors are often associated with hypoxic regions, leading to reduced radiosensitization and increased resistance to radiotherapy and chemotherapy drug penetration.
- 2. Clinical Risk Factors: The 67 years of age with pre-existing comorbidities can impact treatment planning and tolerance. For instance, chronic disease-induced immunosuppression, treatment interruptions, or dose adjustments can lead to poor local control.
- 3. Mid-RT Peritumoral Radiomics Indicators: Imaging reveals high “Surface Area” and large “Mesh Volume,” with relatively low “Sphericity” (0.195495), indicating irregular tumor shapes with significant surface undulations. These features often imply biologically complex tumor behavior with multiple invasion pathways. Texture features such as “Energy,”“Entropy,” and “Uniformity” demonstrate a chaotic density distribution, reflecting heterogeneity in tumor composition. This heterogeneity is often associated with higher malignancy and resistant cell populations.”
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Amar, A.; de Almeida, J.R.; Kanda, J.L.; de Paula, S.M.T.; Lessa, M.M. Epidemiological assessment and therapeutic response in hypopharyngeal cancer. Braz. J. Otorhinolaryngol. 2013, 79, 500–504. [Google Scholar] [CrossRef]
- Luo, X.; Yu, F.; Xu, C.; Deng, Z.; Zeng, Y.; Zhao, X.; Zeng, X. Evaluation of the prevalence of metachronous second primary malignancies in hypopharyngeal carcinoma and their effect on outcomes. Cancer Med. 2022, 11, 1059–1067. [Google Scholar] [CrossRef]
- Bray, F.; Laversanne, M.; Sung, H.; de Martel, C.; Ferlay, J.; Brooks, F.; Mery, L. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2024, 74, 229–263. [Google Scholar] [CrossRef]
- Baumann, R.; Linge, A.; Zips, D. Targeting hypoxia to overcome radiation resistance in head & neck cancers: Real challenge or clinical fairytale? Expert Rev. Anticancer Ther. 2016, 16, 751–758. [Google Scholar]
- Huang, G.; Pan, S.T. ROS-Mediated Therapeutic Strategy in Chemo-/Radiotherapy of Head and Neck Cancer. Oxidative Med. Cell. Longev. 2020, 2020, 5047987. [Google Scholar] [CrossRef]
- Liu, C.; Wang, P.; Xiao, R.; Jiang, W.; Zhao, X.; Liu, S.; Li, T.; Zhang, W.; Li, D.; Chen, D.; et al. Homologous recombination enhances radioresistance in hypopharyngeal cancer cell line by targeting DNA damage response. Oral Oncol. 2020, 100, 104469. [Google Scholar] [CrossRef] [PubMed]
- Zhong, J.T.; Zhou, S.H. Warburg effect, hexokinase-II, and radioresistance of laryngeal carcinoma. Oncotarget 2017, 8, 14133–14146. [Google Scholar] [CrossRef]
- Calvas, O.I.J.; Dedivitis, R.A.; Pfuetzenreiter, R.; Filho, W.J.M.; Santos, C.R.B. Oncological results of surgical treatment versus organ-function preservation in larynx and hypopharynx cancer. Rev. Assoc. Med. Bras. 2017, 63, 1082–1089. [Google Scholar] [CrossRef] [PubMed][Green Version]
- Qian, W.; Chen, M.; Feng, G.; Zhu, X.; Shi, J.; Zhang, W.; Han, J.; Wang, W. Multi-modality management for loco-regionally advanced laryngeal and hypopharyngeal cancer: Balancing the benefit of efficacy and functional preservation. Med. Oncol. 2014, 31, 178. [Google Scholar] [CrossRef]
- Newman, J.R.; Johnson, J.; Hornig, J.D. Survival trends in hypopharyngeal cancer: A population-based review. Laryngoscope 2015, 125, 624–629. [Google Scholar] [CrossRef]
- Chiesa-Estomba, C.M.; Lopez-Flores, R.; Rivas, M.; Rivera-Lomeli, B.; Lopez-Perez, V.; Larranaga-Eguren, Z.; Azaola-Gutierrez, M.P.; Fernandez-Moyano, A.; Aliyeva, T. Radiomics in Hypopharyngeal Cancer Management: A State-of-the-Art Review. Biomedicines 2023, 11, 805. [Google Scholar] [CrossRef]
- Liao, K.Y.; Chiu, C.C.; Chiang, W.C.; Chiou, Y.R.; Zhang, G.; Yang, S.N.; Huang, T.C. Radiomics features analysis of PET images in oropharyngeal and hypopharyngeal cancer. Medicine 2019, 98, e15446. [Google Scholar] [CrossRef]
- Lin, C.H.; Yan, J.L.; Yap, W.K.; Kang, C.J.; Chang, Y.C.; Tsai, T.Y.; Chang, K.P.; Liao, C.T.; Hsu, C.L.; Chou, W.C.; et al. Prognostic value of interim CT-based peritumoral and intratumoral radiomics in laryngeal and hypopharyngeal cancer patients undergoing definitive radiotherapy. Radiother. Oncol. 2023, 189, 109938. [Google Scholar] [CrossRef]
- Lin, Y.C.; Lin, G.; Pandey, S.; Yeh, C.H.; Wang, J.J.; Lin, C.Y.; Ho, T.Y.; Ko, S.F.; Ng, S.H. Fully automated segmentation and radiomics feature extraction of hypopharyngeal cancer on MRI using deep learning. Eur. Radiol. 2023, 33, 6548–6556. [Google Scholar] [CrossRef]
- Mo, X.; Wei, W.; Xu, X.; Zhang, T.; Huang, S. Prognostic value of the radiomics-based model in progression-free survival of hypopharyngeal cancer treated with chemoradiation. Eur. Radiol. 2020, 30, 833–843. [Google Scholar] [CrossRef]
- Siow, T.Y.; Yeh, C.H.; Lin, G.; Lin, C.Y.; Wang, H.M.; Liao, C.T.; Toh, C.H.; Chan, S.C.; Lin, C.P.; Ng, S.H. MRI Radiomics for Predicting Survival in Patients with Locally Advanced Hypopharyngeal Cancer Treated with Concurrent Chemoradiotherapy. Cancers 2022, 14, 6119. [Google Scholar] [CrossRef]
- Su, C.W.; Tsan, D.L.; Hsu, C.L.; Tseng, C.Y.; Lin, C.H.; Fan, K.H.; Huang, Y.C.; Lin, Y.C.; Cheng, Y.F.; Wang, W.H.; et al. Delta-volume radiomics of induction chemotherapy to predict outcome of subsequent chemoradiotherapy for locally advanced hypopharyngeal cancer. Tumori J. 2022, 108, 450–460. [Google Scholar] [CrossRef]
- Wang, Y.; Lei, D. Research progress in CT-based radiomics constructing hypopharyngeal cancer and multisystem tumor prediction model. Lin Chuang Er Bi Yan Hou Tou Jing Wai Ke Za Zhi 2022, 36, 158–162. [Google Scholar]
- Wu, T.C.; Wu, W.T.; Lin, C.Y.; Tseng, H.C.; Yang, Y.W.; Jheng, Y.C.; Chang, T.H.; Chien, Y.C.; Fan, K.H.; Chen, Y.C.; et al. Radiomics analysis for the prediction of locoregional recurrence of locally advanced oropharyngeal cancer and hypopharyngeal cancer. Eur. Arch. Otorhinolaryngol. 2024, 281, 1473–1481. [Google Scholar] [CrossRef]
- Machtay, M.; Lee, J.H.; Moughan, J.; Trotti, A.; Garden, A.S.; Weber, R.S.; Harris, J. Hypopharyngeal dose is associated with severe late toxicity in locally advanced head-and-neck cancer: An RTOG analysis. Int. J. Radiat. Oncol. Biol. Phys. 2012, 84, 983–989. [Google Scholar] [CrossRef]
- Miah, A.B.; Bhide, S.A.; Newbold, K.L.; Clark, C.H.; Webster, G.; Gothard, L.; Dearnaley, D.P.; Rowbottom, C.G.; A’hern, R.P.; Sohaib, S.A.; et al. Dose-escalated intensity-modulated radiotherapy is feasible and may improve locoregional control and laryngeal preservation in laryngo-hypopharyngeal cancers. Int. J. Radiat. Oncol. Biol. Phys. 2012, 82, 539–547. [Google Scholar] [CrossRef]
- Yom, S.S.; Torres-Saavedra, E.; Caudell, J.J.; Waldron, J.N.; Spencer, S.; Saba, N.F.; Sturgis, E.M.; Axelrod, R.S.; Teknos, T.N.; Trotti, A.; et al. Reduced-Dose Radiation Therapy for HPV-Associated Oropharyngeal Carcinoma (NRG Oncology HN002). J. Clin. Oncol. 2021, 39, 956–965. [Google Scholar] [CrossRef]
- Bhuyan, S.S.; Islam, N. Generative Artificial Intelligence Use in Healthcare: Opportunities for Clinical Excellence and Administrative Efficiency. J. Med. Syst. 2025, 49, 10. [Google Scholar] [CrossRef]
- Fahim, Y.A.; Khan, H.; Zafarmand, M.; Ezzat, K.M.; Ahmed, F.; Moustafa, A.M.; Zaki, M.M.; Sayed, S.; Eltobgy, M.A.; Albalawi, H.S.; et al. Artificial intelligence in healthcare and medicine: Clinical applications, therapeutic advances, and future perspectives. Eur. J. Med. Res. 2025, 30, 848. [Google Scholar] [CrossRef]
- Hao, Y.; He, H.; Gu, M.; Chen, F.; Zhang, J.; Li, Y.; Wang, H.; Hu, Z. Large language model integrations in cancer decision-making: A systematic review and meta-analysis. npj Digit. Med. 2025, 8, 450. [Google Scholar] [CrossRef]
- Ah-Thiane, L.; Heudel, P.-E.; Campone, M.; Robert, M.; Brillaud-Meflah, P.; Rousseau, C.; Le Blanc-Onfroy, M.; Tomaszewski, F.; Supiot, S.; Perennec, T.; et al. Large Language Models as Decision-Making Tools in Oncology: Comparing Artificial Intelligence Suggestions and Expert Recommendations. JCO Clin. Cancer Inform. 2025, 9, e2400230. [Google Scholar] [CrossRef]
- Chen, D.; Li, J.; Lin, C.; Zhang, Y.; Yu, J. Large language models in oncology: A review. BMJ Oncol. 2025, 4, e000759. [Google Scholar] [CrossRef]
- Gong, E.J.; Li, W.; Li, S.; Xu, B.; Jiang, Z.; Chen, D.; Chen, W.; Wang, Y. The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study. Bioengineering 2024, 12, 1. [Google Scholar] [CrossRef]
- Yu, Y.; Zhu, J.; Sun, H.; Xu, J.; Sun, J. Using Large Language Models to Retrieve Critical Data from Clinical Processes and Business Rules. Bioengineering 2024, 12, 17. [Google Scholar] [CrossRef]
- Geantă, M.; Bădescu, D.; Chirca, N.; Nechita, O.C.; Radu, C.G.; Rascu, Ș.; Rădăvoi, D.; Sima, C.; Toma, C.; Jinga, V. The Emerging Role of Large Language Models in Improving Prostate Cancer Literacy. Bioengineering 2024, 11, 654. [Google Scholar] [CrossRef]
- Aubreville, M.; Ganz, J.; Ammeling, J.; Rosbach, E.; Gehrke, T.; Scherzad, A.; Hackenberg, S.; Goncalves, M. Prediction of tumor board procedural recommendations using large language models. Eur. Arch. Otorhinolaryngol. 2025, 282, 1619–1629. [Google Scholar] [CrossRef] [PubMed]
- Chao, P.J.; Chang, C.H.; Wu, J.J.; Liu, Y.H.; Shiau, J.; Shih, H.H.; Lin, G.Z.; Lee, S.H.; Lee, T.F. Improving Prediction of Complications Post-Proton Therapy in Lung Cancer Using Large Language Models and Meta-Analysis. Cancer Control 2024, 31, 10732748241286749. [Google Scholar] [CrossRef]
- Jiang, L.Y.; Chen, Y.; Lunsford, L.D.; Chen, Y.; Zhang, Y.T.; Dligach, D.; Moons, K.G.M.; Hsieh, C.; Natarajan, K.; Savova, G.K.; et al. Health system-scale language models are all-purpose prediction engines. Nature 2023, 619, 357–362. [Google Scholar] [CrossRef]
- Sun, D.; Yu, R.; Zhao, S.; Zheng, W.; Wang, Z.; Zhao, Z. Outcome Prediction Using Multi-Modal Information: Integrating Large Language Model-Extracted Clinical Information and Image Analysis. Cancers 2024, 16, 2402. [Google Scholar] [CrossRef]
- Unlu, O.; Duman, Z.B.; Kucuk, A. Retrieval Augmented Generation Enabled Generative Pre-Trained Transformer 4 (GPT-4) Performance for Clinical Trial Screening. medRxiv 2024. [Google Scholar] [CrossRef]
- Zakka, C.; Shad, R.; Chaurasia, A.; Dalal, A.R.; Kim, J.L.; Moor, M.; Fong, R.; Phillips, C.; Alexander, K.; Ashley, E.; et al. Almanac—Retrieval-Augmented Language Models for Clinical Medicine. NEJM AI 2024, 1. [Google Scholar] [CrossRef]
- Petch, J.; Di, S.; Nelson, W. Opening the Black Box: The Promise and Limitations of Explainable Machine Learning in Cardiology. Can. J. Cardiol. 2022, 38, 204–213. [Google Scholar] [CrossRef]
- Rudin, C. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
- Watson, D.S.; Krutzinna, J.; Bruce, I.N.; Griffiths, C.E.; McInnes, I.B.; Barnes, M.R.; Floridi, L. Clinical applications of machine learning algorithms: Beyond the black box. BMJ 2019, 364, l886. [Google Scholar] [CrossRef]
- Cabral, S.; Restrepo, D.; Kanjee, Z.; Wilson, P.; Crowe, B.; Abdulnour, R.E.; Rodman, A. Clinical Reasoning of a Generative Artificial Intelligence Model Compared With Physicians. JAMA Intern. Med. 2024, 184, 581–583. [Google Scholar] [CrossRef]
- Savage, T.; Nayak, A.; Gallo, R.; Rangan, E.; Chen, J.H. Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine. npj Digit. Med. 2024, 7, 20. [Google Scholar] [CrossRef]
- Singhal, K.; Azizi, S.; Tu, T.; Mahdavi, S.S.; Wei, J.; Chung, H.W.; Scales, N.; Tanwani, A.; Cole-Lewis, H.; Pfohl, S.; et al. Large language models encode clinical knowledge. Nature 2023, 620, 172–180. [Google Scholar] [CrossRef]
- Peng, J.; Lu, F.; Huang, J.; Zhang, J.; Gong, W.; Hu, Y.; Wang, J. Development and validation of a pyradiomics signature to predict initial treatment response and prognosis during transarterial chemoembolization in hepatocellular carcinoma. Front. Oncol. 2022, 12, 853254. [Google Scholar] [CrossRef] [PubMed]
- Zwanenburg, A.; Vallières, S.; Sechopoulos, I.O.; Aerts, H. The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping. Radiology 2020, 295, 328–338. [Google Scholar] [CrossRef]
- Myers, S.; Miller, T.A.; Gao, Y.; Churpek, M.M.; Mayampurath, A.; Dligach, D.; Afshar, M. Lessons learned on information retrieval in electronic health records: A comparison of embedding models and pooling strategies. J. Am. Med. Inform. Assoc. 2025, 32, 357–364. [Google Scholar] [CrossRef]
- Tripathi, S.; Alkhulaifat, D.; Lyo, S.; Sukumaran, R.; Li, B.; Acharya, V.; McBeth, R.; Cook, T.S. A Hitchhiker’s Guide to Good Prompting Practices for Large Language Models in Radiology. J. Am. Coll. Radiol. 2025, 22, 841–847. [Google Scholar] [CrossRef]
- Aftab, W.; Apostolou, Z.; Bouazoune, K.; Straub, T. Optimizing biomedical information retrieval with a keyword frequency-driven prompt enhancement strategy. BMC Bioinform. 2024, 25, 281. [Google Scholar] [CrossRef]
- Fagerland, M.W.; Lydersen, S.; Laake, P. The McNemar test for binary matched-pairs data: Mid-p and asymptotic are better than exact conditional. BMC Med. Res. Methodol. 2013, 13, 91. [Google Scholar] [CrossRef]
- Greenland, S. Analysis goals, error-cost sensitivity, and analysis hacking: Essential considerations in hypothesis testing and multiple comparisons. Paediatr. Perinat. Epidemiol. 2021, 35, 8–23. [Google Scholar] [CrossRef]
- Armstrong, R.A. When to use the Bonferroni correction. Ophthalmic Physiol. Opt. 2014, 34, 502–508. [Google Scholar] [CrossRef]
- Hassan, S.U.; Mahmood, A.; Hussain, A.; Khan, W.A. Local interpretable model-agnostic explanation approach for medical imaging analysis: A systematic literature review. Comput. Biol. Med. 2025, 185, 109569. [Google Scholar] [CrossRef]
- Haue, A.D.; Hjaltelin, J.X.; Holm, P.C.; Placido, D.; Brunak, S.R. Artificial intelligence-aided data mining of medical records for cancer detection and screening. Lancet Oncol. 2024, 25, e694–e703. [Google Scholar] [CrossRef]
- He, J.; Wang, X.; Zhu, P.; Wang, X.; Zhang, Y.; Zhao, J.; Sun, W.; Hu, K.; He, W.; Xie, J. Identification and validation of an explainable early-stage chronic kidney disease prediction model: A multicenter retrospective study. EClinicalMedicine 2025, 84, 103286. [Google Scholar] [CrossRef]
- McCoy, R.T.; Yao, S.; Friedman, D.; Hardy, M.D.; Griffiths, T.L. Embers of autoregression show how large language models are shaped by the problem they are trained to solve. Proc. Natl. Acad. Sci. USA 2024, 121, e2322420121. [Google Scholar] [CrossRef]
- Shankar, V.; Wu, W.; Zhang, W.; Wang, J.; Cheng, C.; Zhang, R.; Lee, Y.; Li, K.; Deng, T.; Sun, S.; et al. Clinical-GAN: Trajectory Forecasting of Clinical Events using Transformer and Generative Adversarial Networks. Artif. Intell. Med. 2023, 138, 102507. [Google Scholar] [CrossRef]
- Riley, R.D.; Ensor, J.; Sutton, A.J.; Moons, K.G.M.; Collins, G.S.; van Smeden, M.; Debray, T.P.A.; Snell, K.I.E. Minimum sample size for developing a multivariable prediction model: Part I—Continuous outcomes. Stat. Med. 2019, 38, 1262–1275. [Google Scholar] [CrossRef]
- Redekop, E.; Razzaghi, T.; Chen, Y.; Rizzolo, D.; Li, X.; Dligach, D.; Moons, K.G.M.; Hsieh, C.; Natarajan, K.; Savova, G.K. Zero-shot medical event prediction using a generative pretrained transformer on electronic health records. J. Am. Med. Inform. Assoc. 2025, 32, ocaf160. [Google Scholar] [CrossRef]
- Naliyatthaliyazchayil, P.; Albalushi, M.; Alkindi, N.; Damanhori, N.; Al-Adawi, S.; Ambusaidi, M.; Al-Tobi, M. Evaluating the Reasoning Capabilities of Large Language Models for Medical Coding and Hospital Readmission Risk Stratification: Zero-Shot Prompting Approach. J. Med. Internet Res. 2025, 27, e74142. [Google Scholar] [CrossRef]
- Han, X.; Zhang, Z.; Xu, H.; Huang, J.; Xu, B.; Jiang, T.; Lu, S.; Li, S.; Xu, Y. ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts. IEEE Trans. Image Process. 2025, 34, 7209–7221. [Google Scholar] [CrossRef]
- Xu, P.; Wu, Y.; Jiang, T.; Chen, Y.; Yu, P.; Chen, Y.; Zheng, Q.; Wei, W. LVLM-EHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 47, 1877–1893. [Google Scholar] [CrossRef]
- Feng, D.; Zhang, W.; Ma, H.; Zuo, T. A comparison of confidence/credible interval methods for the area under the ROC curve for continuous diagnostic tests with small sample size. Stat. Methods Med. Res. 2017, 26, 2603–2621. [Google Scholar] [CrossRef] [PubMed]
- Reiazi, R.; Mehdi, S.; Ghadirzadeh, M. The impact of the variation of imaging parameters on the robustness of Computed Tomography radiomic features: A review. Comput. Biol. Med. 2021, 133, 104400. [Google Scholar] [CrossRef]
- Levi, R.; Kelleher, D.; Fode, K.; Varmark, C.; Hansen, M.; Therkildsen, J.; Støttrup, C.; Sørensen, L.; Hansen, R. A reference framework for standardization and harmonization of CT radiomics features on cadaveric sample. Sci. Rep. 2024, 14, 19259. [Google Scholar] [CrossRef] [PubMed]
- Xiang, J.; Zhao, Y.; Zhang, R.; Chen, T.; He, Y.; Li, J.; Li, W.; Li, S.; Zhang, W. A vision-language foundation model for precision oncology. Nature 2025, 638, 769–778. [Google Scholar] [CrossRef]
- Nguyen, H.H.; Blaschko, M.B.; Saarakkala, S.; Tiulpin, A. Clinically-Inspired Multi-Agent Transformers for Disease Trajectory Forecasting From Multimodal Data. IEEE Trans. Med. Imaging 2024, 43, 529–541. [Google Scholar] [CrossRef]
- Kaczmarek, W.; Magdziarz Ibrahim-El-Nur, J.; Łoś, M.; Roleder, T. Reconstructing cerebral hemodynamics from sparse data using Neural Operator Transformers. Comput. Biol. Med. 2025, 195, 110492. [Google Scholar] [CrossRef]
- Sakovich, N.; Dmitry Aksenov, D.; Pleshakova, E.; Gataullin, S. A neural operator using dynamic mode decomposition analysis to approximate partial differential equations. AIMS Math. 2025, 10, 22432–22444. [Google Scholar] [CrossRef]









| Characteristics | n (%) |
|---|---|
| Accrual time | 2006–2013 |
| Median age, years | 56 (34–80) |
| Male sex | 88 (95.7) |
| Tumor stage | |
| T1–T2 | 23 (25.0) |
| T3–T4 | 69 (75.0) |
| Nodal stage | |
| N0–N1 | 48 (52.2) |
| N2–N3 | 44 (47.8) |
| Overall stage | |
| I | 1 (1.1) |
| II | 5 (5.5) |
| III | 21 (22.8) |
| IVA | 52 (56.5) |
| IVB | 13 (14.1) |
| Primary tumor site | |
| Hypopharynx | 69 (75.0) |
| Larynx | 23 (25.0) |
| Median tumor volume (pre-RT), cm3 | 19.5 (1.8–138.0) |
| Median tumor volume (mid-RT), cm3 | 16.9 (0.5–113.3) |
| Cigarette smoking | 88 (95.7) |
| Betel quid chewing | 66 (71.7) |
| Alcohol drinking | 50 (54.3) |
| Presence of medical comorbidities | 60 (65.2) |
| Chemotherapy | 67 (72.8) |
| Median treatment duration, days | 54 (47–68) |
| Median EQD2, Gy | 72 (66–72) |
| Median CLC (pre-RT), per mm3 | 1979 (645–5120) |
| Median CLC (nadir during RT), per mm3 | 456 (56–1456) |
| OS | Baseline | Pre-RT Intra-T | Pre-RT Peri-T | Mid-RT Intra-T | Mid-RT Peri-T |
|---|---|---|---|---|---|
| GPT-4o-2024-08-06 | |||||
| Sensitivity | 1.0000 | 0.7327 | 0.7122 | 0.7939 | 0.7327 |
| Specificity | 0.2930 | 0.7279 | 0.7047 | 0.6349 | 0.6581 |
| AUC | 0.6465 | 0.7303 | 0.7084 | 0.7144 | 0.6954 |
| Gemma-2-27b-it | |||||
| Sensitivity | 1.0000 | 0.8143 | 0.8143 | 0.8551 | 0.8551 |
| Specificity | 0.2233 | 0.6349 | 0.6581 | 0.4953 | 0.5186 |
| AUC | 0.6116 | 0.7246 | 0.7362 | 0.6752 | 0.6869 |
| Recurrence | Baseline | Pre-RT Intra-T | Pre-RT Peri-T | Mid-RT Intra-T | Mid-RT Peri-T |
|---|---|---|---|---|---|
| GPT-4o-2024-08-06 | |||||
| Sensitivity | 1.0000 | 0.7667 | 0.7481 | 0.8222 | 0.8593 |
| Specificity | 0.3579 | 0.6000 | 0.5737 | 0.4947 | 0.5474 |
| AUC | 0.6789 | 0.6833 | 0.6609 | 0.6585 | 0.7033 |
| Gemma-2-27b-it | |||||
| Sensitivity | 1.0000 | 0.8037 | 0.7852 | 0.8593 | 0.8963 |
| Specificity | 0.2263 | 0.5737 | 0.5737 | 0.4684 | 0.5211 |
| AUC | 0.6132 | 0.6887 | 0.6794 | 0.6638 | 0.7087 |
| Metastasis | Baseline | Pre-RT Intra-T | Pre-RT Peri-T | Mid-RT Intra-T | Mid-RT Peri-T |
|---|---|---|---|---|---|
| GPT-4o-2024-08-06 | |||||
| Sensitivity | 0.8800 | 0.7201 | 0.7000 | 0.8000 | 0.7392 |
| Specificity | 0.5286 | 0.7188 | 0.6952 | 0.5762 | 0.6012 |
| AUC | 0.7043 | 0.7195 | 0.6976 | 0.6881 | 0.6700 |
| Gemma-2-27b-it | |||||
| Sensitivity | 0.9800 | 0.7600 | 0.7400 | 0.8200 | 0.7600 |
| Specificity | 0.2714 | 0.6952 | 0.6952 | 0.5524 | 0.5762 |
| AUC | 0.6257 | 0.7276 | 0.7176 | 0.6862 | 0.6681 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yap, W.-K.; Cheng, S.-C.; Lin, C.-H.; Hsiao, I.-T.; Tsai, T.-Y.; Yap, W.-L.; Chen, W.P.-Y.; Lin, C.-Y.; Huang, S.-M. An External Validation Study on Two Pre-Trained Large Language Models for Multimodal Prognostication in Laryngeal and Hypopharyngeal Cancer: Integrating Clinical, Treatment, and Radiomic Data to Predict Survival Outcomes with Interpretable Reasoning. Bioengineering 2025, 12, 1345. https://doi.org/10.3390/bioengineering12121345
Yap W-K, Cheng S-C, Lin C-H, Hsiao I-T, Tsai T-Y, Yap W-L, Chen WP-Y, Lin C-Y, Huang S-M. An External Validation Study on Two Pre-Trained Large Language Models for Multimodal Prognostication in Laryngeal and Hypopharyngeal Cancer: Integrating Clinical, Treatment, and Radiomic Data to Predict Survival Outcomes with Interpretable Reasoning. Bioengineering. 2025; 12(12):1345. https://doi.org/10.3390/bioengineering12121345
Chicago/Turabian StyleYap, Wing-Keen, Shih-Chun Cheng, Chia-Hsin Lin, Ing-Tsung Hsiao, Tsung-You Tsai, Wing-Lake Yap, Willy Po-Yuan Chen, Chien-Yu Lin, and Shih-Ming Huang. 2025. "An External Validation Study on Two Pre-Trained Large Language Models for Multimodal Prognostication in Laryngeal and Hypopharyngeal Cancer: Integrating Clinical, Treatment, and Radiomic Data to Predict Survival Outcomes with Interpretable Reasoning" Bioengineering 12, no. 12: 1345. https://doi.org/10.3390/bioengineering12121345
APA StyleYap, W.-K., Cheng, S.-C., Lin, C.-H., Hsiao, I.-T., Tsai, T.-Y., Yap, W.-L., Chen, W. P.-Y., Lin, C.-Y., & Huang, S.-M. (2025). An External Validation Study on Two Pre-Trained Large Language Models for Multimodal Prognostication in Laryngeal and Hypopharyngeal Cancer: Integrating Clinical, Treatment, and Radiomic Data to Predict Survival Outcomes with Interpretable Reasoning. Bioengineering, 12(12), 1345. https://doi.org/10.3390/bioengineering12121345

