Expert Evaluation of ChatGPT-4 Responses to Upper Tract Urothelial Carcinoma Questions: A Prospective Comparative Study with Guideline-Based and Patient-Focused Queries
Abstract
1. Introduction
2. Materials and Methods
- EAU Guideline-Based Questions (n = 60): Specific clinical questions were systematically extracted from the 2025 EAU Upper Tract Urothelial Carcinoma section. These questions covered explicit recommendations and in-text evidence statements on epidemiology, etiology, pathology, diagnosis, staging, risk stratification, treatment, metastatic disease management, and follow-up.
- Frequently Asked Questions (FAQs) (n = 17): General patient-oriented questions were compiled from major international urology association websites and reputable medical information portals. These questions reflected common, non-specialist inquiries frequently encountered in clinical consultations or online patient forums (Appendix A, Table A1).
- Binary Answer Scoring:
- Detailed Accuracy Scoring:
3. Results
4. Discussion
- Safe Use for Patient Education: The model reliably provides general disease information to patients, especially for FAQs, but should always include disclaimers about the need for professional medical consultation.
- Caution in Clinical Decision Support: Guideline-based and complex management questions should always be verified by experts, particularly in areas involving follow-up protocols and risk stratification.
- Targeted Model Optimization: Incorporating updated guideline datasets and supervised fine-tuning may improve weaker areas, with particular focus on dynamic clinical protocols and recent research findings.
- Ongoing Monitoring: Regular reassessment is essential as AI models and clinical guidelines evolve, with systematic evaluation protocols to track performance over time.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
EAU | European Association of Urology |
UTUC | Upper Tract Urothelial Carcinoma |
FAQs | Frequently Asked Questions |
LLM | Large Language Models |
STROBE | Strengthening the Reporting of Observational Studies in Epidemiology |
USMLE | United States Medical Licensing Examination |
BERT | Bidirectional Encoder Representations from Transformers |
Appendix A
Issue | Question to ChatGPT |
---|---|
Epidemiology Etiology and Patology | Is there a relationship between aristolochic acid and upper urinary tract carcinoma? |
Is upper urinary tract carcinoma associated with any syndrome? | |
What criteria are used to evaluate Lynch syndrome in patients with upper urinary tract carcinoma? | |
What are the Amsterdam 2 criteria? | |
What tests should be performed in patients with suspected hereditary upper urinary tract urothelial carcinoma? | |
What is the prevalence of UTUC in high-risk NMIBC patients treated with intravesical bacillus Calmette–Guérin (BCG)? | |
What is the prevalence of UTUC in patients with MIBC treated with radical cystectomy? | |
What is the risk of bladder recurrence in UTUC patients after UTUC treatment? | |
What percentage of patients have muscle invasion at the time of diagnosis? | |
Classification And Staging Systems | Which patients with upper urinary tract urothelial carcinoma should be offered mismatch repair (MMR) proteins or microsatellite instability testing? Which staging system is used for upper urinary tract urothelial cell carcinoma? Which grading system is used for upper urinary tract urothelial cell carcinoma? |
Diagnosis | Which surgical or imaging method should be used in the diagnosis and staging of upper urinary tract carcinoma? |
What can voided cytology indicate when cystoscopy is normal and there is no CIS in the bladder and prostate urethra? | |
What is the sensitivity of selective urine cytology in urothelial high-grade tumors, including carcinoma in situ? Should cystoscopy be performed in upper urinary tract tumors? | |
Why should cystoscopy be performed in upper urinary tract tumors? Which patients with suspected upper urinary tract tumors should we perform cytology? Which imaging technique should be used for the diagnosis and staging of upper urinary tract tumors? | |
Which imaging technique should be used for chest imaging in high-risk upper urinary tract tumors? | |
Which imaging modality should be used to exclude metastases in patients with high-risk upper urinary tract tumors? Should urethral ureteroscopy be performed for diagnosis and/or risk stratification in every patient with suspected upper urinary tract urothelial carcinoma? In which patients with upper urinary tract tumors should we perform FGFR 2/3 testing? Is the risk of thoracic metastasis low or high in low-risk UTUC? | |
Risk Stratification | For nonmetastatic UTI, what factors are included in the risk stratification based on risk of progression to >pT2/non-organ-confined disease? Which factors are considered high risk? |
What are the factors affecting the risk of bladder recurrence after surgery for Upper Urinary Tract Urothelial Cell Carcinoma? | |
Are there any validated molecular biomarkers for clinical use in Upper Urinary Tract Urothelial Cell Carcinoma? Approximately how many UTUC patients have multifocal tumors? | |
Does hydroureteronephrosis affect the prognosis in patients treated with RNU? What are the strong and weak criteria when we look at the high-risk group in UTUC? | |
Disease Management | What is the primary treatment option for low-risk upper urinary tract tumors? |
What are the treatment options for low-risk tumors of the distal ureter? | |
When should a second look ureteroscopy be performed after the first endoscopic treatment in upper urinary tract tumors? What is the standard treatment for high-risk UTUC? Are the oncological outcomes of open, laparoscopic and robotic approaches different? | |
Should the bladder cuff be removed during radical nephroureterectomy? Why? Should template-based LND be performed in muscle-invasive upper urothelial carcinoma? If so, why? Is chemotherapy required after radical nephroureterectomy for upper urinary tract tumors? If so, which patients require it? Which chemotherapeutic agents are used after radical nephroureterectomy for upper urinary tract tumors? Do they increase survival? What does single post-operative intravesical instillation of chemotherapy do? In which patients should it be applied? What are the surgical options for patients with high-risk upper urinary tract urothelial tumors in the distal ureter? Can kidney-sparing treatment be performed in patients with high-risk upper urinary tract urothelial carcinoma? What are the risks this treatment? Do pembrolizumab and nivolumab have a role in the treatment of upper urinary tract urothelial carcinoma? If so, in which patients should they be used? Should we prefer laparoscopic surgery in patients with T3 and above? Can upper urinary tract urothelial carcinoma be treated with a percutaneous approach? If yes, which patients? What are the Platinum-eligible criteria for systemic treatment of upper urinary tract tumors? Is enfortumab vedotin used as a first-line treatment in metastatic upper urinary tract tumors? Which agents can be used in the first-line treatment of metastatic upper urinary urothelial carcinoma? Are there any benefit over each other? Is cisplatin-based combination chemotherapy effective in metastatic upper urinary urothelial carcinoma? If so, what benefits does it provide to the patient? Should cisplatin alone or cisplatin plus nivolumab be used in the treatment of metastatic upper urinary tract urothelial carcinoma? What treatments can be given to patients who cannot use cisplatin in the treatment of metastatic upper urinary tract urothelial carcinoma? Does radical nephroureterectomy provide survival benefits for patients with metastatic upper urinary tract urothelial carcinoma? What treatment options can be used to reduce symptoms in patients with metastatic upper urinary tract urothelial carcinoma? In patients without disease progression after 4 to 6 cycles of gemcitabine plus cisplatin or carboplatin, what treatment options can be offered to improve survival? In which patients with metastatic upper urinary tract urothelial carcinoma can erdafitinib be used? Does enfortumab vedotin increase overall survival in patients with metastatic upper urinary tract urothelial carcinoma? If so, in which group of patients? | |
Follow Up | At what intervals and with what tests should patients with low-risk upper urinary tract urothelial carcinoma be followed up after radical nephroureterectomy? |
At what intervals and with what tests should patients with high-risk upper urinary tract urothelial carcinoma be followed up after radical nephroureterectomy? At what intervals and with what tests should patients with low-risk upper urinary tract urothelial carcinoma be followed up after kidney-sparing management? At what intervals and with what tests should patients with high-risk upper urinary tract urothelial carcinoma be followed up after kidney-sparing management? | |
Frequently Asked Questions | What is upper urinary tract cancer and which organs does it affect? Who gets upper urinary tract cancer? |
Is upper urinary tract cancer contagious? What are the risk factors for upper urinary tract carcinoma? Is there a genetic predisposition to upper urinary tract cancer? What are the symptoms of upper urinary tract carcinoma? When should bleeding in urine (hematuria) be taken seriously? How are upper urinary tract tumors diagnosed? When are CT, MRI, cystoscopy, and ureterorenoscopy preferred for the diagnosis of upper urinary tract tumors? What is the role of urine cytology in the diagnosis of upper urinary tract tumors? What are the treatment options for upper urinary tract tumors? Who can undergo kidney-sparing (endoscopic) treatment for upper urinary tract tumors? What is radical nephroureterectomy? Is laparoscopic or robotic surgery possible for radical nephroureterectomy? Is lymph node dissection necessary in radical nephroureterectomy? In what cases is chemotherapy or immunotherapy applied for ureteral tumors? What do the stage and grade of the tumor mean? |
References
- Topol, E.J. High-performance medicine: The convergence of human and artificial intelligence. Nat. Med. 2019, 25, 44–56. [Google Scholar] [CrossRef] [PubMed]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. arXiv 2020, arXiv:2005.14165. [Google Scholar] [CrossRef]
- WHO Guidance. Ethics and Governance of Artificial Intelligence for Health; World Health Organization: Geneva, Switzerland, 2021; Available online: https://www.who.int/publications/i/item/9789240029200 (accessed on 4 July 2025).
- Upper Urinary Tract Urothelial Cell Carcinoma—Uroweb. Available online: https://uroweb.org/guidelines/upper-urinary-tract-urothelial-cell-carcinoma/chapter/epidemiology-aetiology-and-pathology (accessed on 5 July 2025).
- Zganjar, A.J.; Thiel, D.D.; Lyon, T.D. Diagnosis, workup, and risk stratification of upper tract urothelial carcinoma. Transl. Androl. Urol. 2023, 12, 1456–1468. [Google Scholar] [CrossRef] [PubMed]
- Lejbkowicz, I. Web-based information for patients and providers. In Impacts of Information Technology on Patient Care and Empowerment; IGI Global: Hershey, PA, USA, 2020; pp. 19–33. [Google Scholar] [CrossRef]
- Hertling, S.; Matziolis, G.; Graul, I. The role of the Internet as a source of medical information for orthopedic patients. Orthopadie 2022, 51, 521–530. [Google Scholar] [CrossRef] [PubMed]
- Urom, C.; Grey, B.; Lindinger-Sternart, S.; Lucey, S. The new wave: Integrating artificial intelligence into ethical and multicultural counselling. Couns. Psychother. Res. 2025, 25, e12830. [Google Scholar] [CrossRef]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
- Biswas, S. ChatGPT and the future of medical writing. Radiology 2023, 307, e223312. [Google Scholar] [CrossRef] [PubMed]
- Sallam, M.; Salim, N.A.; Barakat, M.; Al-Tammemi, A.B. ChatGPT applications in medical, dental, pharmacy, and public health education: A descriptive study highlighting the advantages and limitations. Narra J. 2023, 3, e103. [Google Scholar] [CrossRef] [PubMed]
- Łaszkiewicz, J.; Krajewski, W.; Tomczak, W.; Chorbińska, J.; Nowak, Ł.; Chełmoński, A.; Krajewski, P.; Sójka, A.; Małkiewicz, B.; Szydełko, T. Performance of ChatGPT in providing patient information about upper tract urothelial carcinoma. Contemp. Oncol. 2024, 28, 172–181. [Google Scholar] [CrossRef] [PubMed]
- Von Elm, E.; Altman, D.G.; Egger, M.; Pocock, S.J.; Gøtzsche, P.C.; Vandenbroucke, J.P. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: Guidelines for reporting observational studies. Lancet 2007, 370, 1453–1457. [Google Scholar] [CrossRef] [PubMed]
- Gilson, A.; Safranek, C.W.; Huang, T.; Socrates, V.; Chi, L.; Taylor, R.A.; Chartash, D. How does ChatGPT perform on the United States Medical Licensing Examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med. Educ. 2023, 9, e45312. [Google Scholar] [CrossRef] [PubMed]
- Garabet, R.; Mackey, B.P.; Cross, J.; Weingarten, M. ChatGPT-4 performance on USMLE step 1 style questions and its implications for medical education: A comparative study across systems and disciplines. Med. Sci. Educ. 2024, 34, 145–152. [Google Scholar] [CrossRef] [PubMed]
- Rogers, A.; Gardner, M.; Augenstein, I. Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension. ACM Comput. Surv. 2023, 55, 197. [Google Scholar] [CrossRef]
- Marcus, G.; Davis, E. GPT-3, Bloviator: OpenAI’s Language Generator Has No Idea What It’s Talking About. Available online: https://www.technologyreview.com/2020/08/22/1007539/gpt3-openai-language-generator-artificial-intelligence-ai-opinion/ (accessed on 15 July 2025).
- Rajkomar, A.; Dean, J.; Kohane, I. Machine learning in medicine. N. Engl. J. Med. 2019, 380, 1347–1358. [Google Scholar] [CrossRef] [PubMed]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
- Singhal, K.; Azizi, S.; Tu, T.; Mahdavi, S.S.; Wei, J.; Chung, H.W.; Scales, N.; Tanwani, A.; Cole-Lewis, H.; Pfohl, S.; et al. Large language models encode clinical knowledge. Nature 2023, 620, 172–180. [Google Scholar] [CrossRef] [PubMed]
- Ayers, J.W.; Poliak, A.; Dredze, M.; Leas, E.C.; Zhu, Z.; Kelley, J.B.; Faix, D.J.; Goodman, A.M.; Longhurst, C.A.; Hogarth, M.; et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern. Med. 2023, 183, 589–596. [Google Scholar] [CrossRef] [PubMed]
Category | Correct Answers n (%) | Incorrect Answers n (%) | Accuracy Score Mean ± SD | Accuracy Score Median (IQR) | 95% CI |
---|---|---|---|---|---|
EAU Guidelines (n = 60) | 54 (90.0) | 6 (10.0) | 1.28 ± 0.74 * | 1.0 (1.0–1.0) | 79.9–95.3% |
Frequently Asked Questions (n = 17) | 17 (100.0) | 0 (0.0) | 1.00 ± 0.00 * | 1.0 (1.0–1.0) | 81.6–100% |
Total (n = 77) | 71 (92.2) | 6 (7.8) | 1.22 ± 0.66 | 1.0 (1.0–1.0) | 84.0–96.4% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Beyatlı, M.; Güngör, H.S.; İnkaya, A.; Sobay, R.; Tahra, A.; Küçük, E.V. Expert Evaluation of ChatGPT-4 Responses to Upper Tract Urothelial Carcinoma Questions: A Prospective Comparative Study with Guideline-Based and Patient-Focused Queries. J. Clin. Med. 2025, 14, 6353. https://doi.org/10.3390/jcm14186353
Beyatlı M, Güngör HS, İnkaya A, Sobay R, Tahra A, Küçük EV. Expert Evaluation of ChatGPT-4 Responses to Upper Tract Urothelial Carcinoma Questions: A Prospective Comparative Study with Guideline-Based and Patient-Focused Queries. Journal of Clinical Medicine. 2025; 14(18):6353. https://doi.org/10.3390/jcm14186353
Chicago/Turabian StyleBeyatlı, Murat, Hasan Samet Güngör, Abdurrahman İnkaya, Resul Sobay, Ahmet Tahra, and Eyüp Veli Küçük. 2025. "Expert Evaluation of ChatGPT-4 Responses to Upper Tract Urothelial Carcinoma Questions: A Prospective Comparative Study with Guideline-Based and Patient-Focused Queries" Journal of Clinical Medicine 14, no. 18: 6353. https://doi.org/10.3390/jcm14186353
APA StyleBeyatlı, M., Güngör, H. S., İnkaya, A., Sobay, R., Tahra, A., & Küçük, E. V. (2025). Expert Evaluation of ChatGPT-4 Responses to Upper Tract Urothelial Carcinoma Questions: A Prospective Comparative Study with Guideline-Based and Patient-Focused Queries. Journal of Clinical Medicine, 14(18), 6353. https://doi.org/10.3390/jcm14186353