Dr. LLM Will See You Now: The Ability of ChatGPT to Provide Geographically Tailored Colorectal Cancer Screening and Surveillance Recommendations
Abstract
1. Introduction
2. Materials and Methods
2.1. Study Design
2.2. LLM Selection and Objectives
2.3. Prompt Engineering and Testing
2.4. Query Strategy
2.5. Performance Evaluation
- Correct: Fully aligned with country-specific clinical guidelines.
- Partially Correct: Aligned in part but omitted key details or contained minor inaccuracies.
- Incorrect: Misaligned with guidelines or provided misleading recommendations.
3. Results
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Lee, P.; Bubeck, S.; Petro, J. Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. N. Engl. J. Med. 2023, 388, 1233–1239. [Google Scholar] [CrossRef] [PubMed]
- Dave, T.; Athaluri, S.A.; Singh, S. ChatGPT in medicine: An overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front. Artif. Intell. 2023, 6, 1169595. [Google Scholar] [CrossRef] [PubMed]
- Temsah, M.-H.; Aljamaan, F.; Malki, K.H.; Alhasan, K.; Altamimi, I.; Aljarbou, R.; Bazuhair, F.; Alsubaihin, A.; Abdulmajeed, N.; Alshahrani, F.S.; et al. ChatGPT and the Future of Digital Health: A Study on Healthcare Workers’ Perceptions and Expectations. Healthcare 2023, 11, 1812. [Google Scholar] [CrossRef] [PubMed]
- Omiye, J.A.; Gui, H.; Rezaei, S.J.; Zou, J.; Daneshjou, R. Large Language Models in Medicine: The Potentials and Pitfalls: A Narrative Review. Ann. Intern. Med. 2024, 177, 210–220. [Google Scholar] [CrossRef] [PubMed]
- Gallifant, J.; Fiske, A.; Strekalova, Y.A.; Osorio-Valencia, J.S.; Parke, R.; Mwavu, R.; Martinez, N.; Gichoya, J.W.; Ghassemi, M.; Demner-Fushman, D.; et al. Peer review of GPT-4 technical report and systems card. PLOS Digit. Health 2024, 3, e0000417. [Google Scholar] [CrossRef] [PubMed]
- Wang, C.; Liu, S.; Yang, H.; Guo, J.; Wu, Y.; Liu, J. Ethical Considerations of Using ChatGPT in Health Care. J. Med. Internet Res. 2023, 25, e48009. [Google Scholar] [CrossRef] [PubMed]
- Naik, N.; Hameed, B.M.Z.; Shetty, D.K.; Swain, D.; Shah, M.; Paul, R.; Aggarwal, K.; Ibrahim, S.; Patil, V.; Smriti, K.; et al. Legal and Ethical Consideration in Artificial Intelligence in Healthcare: Who Takes Responsibility? Front. Surg. 2022, 9, 862322. [Google Scholar] [CrossRef] [PubMed]
- Clusmann, J.; Kolbinger, F.R.; Muti, H.S.; Carrero, Z.I.; Eckardt, J.-N.; Laleh, N.G.; Löffler, C.M.L.; Schwarzkopf, S.-C.; Unger, M.; Veldhuizen, G.P.; et al. The future landscape of large language models in medicine. Commun. Med. 2023, 3, 141. [Google Scholar] [CrossRef] [PubMed]
- Huo, B.; McKechnie, T.; Ortenzi, M.; Lee, Y.; Antoniou, S.; Mayol, J.; Ahmed, H.; Boudreau, V.; Ramji, K.; Eskicioglu, C. Dr. GPT will see you now: The ability of large language model-linked chatbots to provide colorectal cancer screening recommendations. Health Technol. 2024, 14, 463–469. [Google Scholar] [CrossRef]
- Pereyra, L.; Schlottmann, F.; Steinberg, L.; Lasa, J. Colorectal Cancer Prevention: Is Chat Generative Pretrained Transformer (Chat GPT) ready to Assist Physicians in Determining Appropriate Screening and Surveillance Recommendations? J. Clin. Gastroenterol. 2024, 58, 1022–1027. [Google Scholar] [CrossRef] [PubMed]
- Choo, J.M.; Ryu, H.S.; Kim, J.S.; Cheong, J.Y.; Baek, S.; Kwak, J.M. Conversational artificial intelligence (chatGPTTM) in the management of complex colorectal cancer patients: Early experience. ANZ J. Surg. 2024, 94, 356–361. [Google Scholar] [CrossRef] [PubMed]
- Grimm, D.R.; Lee, Y.-J.; Hu, K.; Liu, L.; Garcia, O.; Balakrishnan, K.; Ayoub, N.F. The utility of ChatGPT as a generative medical translator. Eur. Arch. Otorhinolaryngol. 2024, 281, 6161–6165. [Google Scholar] [CrossRef] [PubMed]
- Li, Z.; Shi, Y.; Liu, Z.; Yang, F.; Liu, N.; Du, M. Quantifying Multilingual Performance of Large Language Models Across Languages. arXiv 2024, arXiv:2404.11553. [Google Scholar]
- Maida, M.; Mori, Y.; Fuccio, L.; Sferrazza, S.; Vitello, A.; Facciorusso, A.; Hassan, C. Exploring ChatGPT effectiveness in addressing direct patient queries on colorectal cancer screening. Endosc. Int. Open 2025, 13, a25689416. [Google Scholar] [CrossRef] [PubMed]
- Creswell, J.W.; Fetters, M.D.; Ivankova, N.V. Designing a mixed methods study in primary care. Ann. Fam. Med. 2004, 2, 7–12. [Google Scholar] [CrossRef] [PubMed]
- Fetters, M.D.; Curry, L.A.; Creswell, J.W. Achieving integration in mixed methods designs-principles and practices. Health Serv. Res. 2013, 48 (6 Pt 2), 2134–2156. [Google Scholar] [CrossRef] [PubMed]
- Fetters, M.D.; Molina-Azorin, J.F. A Checklist of Mixed Methods Elements in a Submission for Advancing the Methodology of Mixed Methods Research. J. Mix. Methods Res. 2019, 13, 414–423. [Google Scholar] [CrossRef]
- Vasey, B.; Nagendran, M.; Campbell, B.; Clifton, D.A.; Collins, G.S.; Denaxas, S.; Denniston, A.K.; Faes, L.; Geerts, B.; Ibrahim, M.; et al. Reporting guideline for the early-stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI. Nat. Med. 2022, 28, 924–933. [Google Scholar] [CrossRef] [PubMed]
- Wolf, A.M.D.; Fontham, E.T.H.; Church, T.R.; Flowers, C.R.; Guerra, C.E.; LaMonte, S.J.; Etzioni, R.; McKenna, M.T.; Oeffinger, K.C.; Shih, Y.-C.T.; et al. Colorectal cancer screening for average-risk adults: 2018 guideline update from the American Cancer Society. CA Cancer J. Clin. 2018, 68, 250–281. [Google Scholar] [CrossRef] [PubMed]
- Rex, D.K.; Boland, C.R.; Dominitz, J.A.; Giardiello, F.M.; Johnson, D.A.; Kaltenbach, T.; Levin, T.R.; Lieberman, D.; Robertson, D.J. Colorectal Cancer Screening: Recommendations for Physicians and Patients from the U.S. Multi-Society Task Force on Colorectal Cancer. Am. J. Gastroenterol. 2017, 112, 1016–1030. [Google Scholar] [CrossRef] [PubMed]
- Rutter, M.D.; East, J.; Rees, C.J.; Cripps, N.; Docherty, J.; Dolwani, S.; Kaye, P.V.; Monahan, K.J.; Novelli, M.R.; Plumb, A.; et al. British Society of Gastroenterology/Association of Coloproctology of Great Britain and Ireland/Public Health England post-polypectomy and post-colorectal cancer resection surveillance guidelines. Gut 2020, 69, 201–223. [Google Scholar] [CrossRef] [PubMed]
- Huo, B.; Cacciamani, G.E.; Collins, G.S.; McKechnie, T.; Lee, Y.; Guyatt, G. Reporting standards for the use of large language model-linked chatbots for health advice. Nat. Med. 2023, 29, 2988. [Google Scholar] [CrossRef] [PubMed]
- Balshem, H.; Helfand, M.; Schünemann, H.J.; Oxman, A.D.; Kunz, R.; Brozek, J.; Vist, G.E.; Falck-Ytter, Y.; Meerpohl, J.; Norris, S.; et al. GRADE guidelines: 3. Rating the quality of evidence. J. Clin. Epidemiol. 2011, 64, 401–406. [Google Scholar] [CrossRef] [PubMed]
- Alonso-Coello, P.; Schünemann, H.J.; Moberg, J.; Brignardello-Petersen, R.; Akl, E.A.; Davoli, M.; Treweek, S.; A Mustafa, R.; Vandvik, P.O.; Meerpohl, J.; et al. GRADE Evidence to Decision (EtD) frameworks: A systematic and transparent approach to making well informed healthcare choices. 1: Introduction. BMJ 2016, 353, i2016. [Google Scholar] [CrossRef] [PubMed]
- Shekelle, P.G.; Woolf, S.H.; Eccles, M.; Grimshaw, J. Clinical guidelines: Developing guidelines. BMJ 1999, 318, 593–596. [Google Scholar] [CrossRef] [PubMed]
- Andrews, J.C.; Schünemann, H.J.; Oxman, A.D.; Pottie, K.; Meerpohl, J.J.; Coello, P.A.; Rind, D.; Montori, V.M.; Brito, J.P.; Norris, S.; et al. GRADE guidelines: 15. Going from evidence to recommendation-determinants of a recommendation’s direction and strength. J. Clin. Epidemiol. 2013, 66, 726–735. [Google Scholar] [CrossRef] [PubMed]
- Garg, R.K.; Urs, V.L.; Agarwal, A.A.; Chaudhary, S.K.; Paliwal, V.; Kar, S.K. Exploring the role of ChatGPT in patient care (diagnosis and treatment) and medical research: A systematic review. Health Promot. Perspect. 2023, 13, 183–191. [Google Scholar] [CrossRef] [PubMed]
- Wu, T.; He, S.; Liu, J.; Sun, S.; Liu, K.; Han, Q.-L.; Tang, Y. A Brief Overview of ChatGPT: The History, Status Quo and Potential Future Development. IEEE/CAA J. Autom. Sin. 2023, 10, 1122–1136. [Google Scholar] [CrossRef]
- Ateia, S.; Kruschwitz, U. Is ChatGPT a Biomedical Expert?—Exploring the Zero-Shot Performance of Current GPT Models in Biomedical Tasks. arXiv 2023, arXiv:2306.16108. [Google Scholar]
- Garcia, X.; Bansal, Y.; Cherry, C.; Foster, G.; Krikun, M.; Feng, F.; Johnson, M.; Firat, O. The unreasonable effectiveness of few-shot learning for machine translation. arXiv 2023, arXiv:2302.01398. [Google Scholar]
- Lample, G.; Conneau, A.; Denoyer, L.; Ranzato, M. Unsupervised Machine Translation Using Monolingual Corpora Only. arXiv 2018, arXiv:1711.00043. [Google Scholar]
- Ray, P.P. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Int. Things Cyber-Phys. Syst. 2023, 3, 121–154. [Google Scholar] [CrossRef]
- Takagi, S.; Watari, T.; Erabi, A.; Sakaguchi, K. Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study. JMIR Med. Educ. 2023, 9, e48002. [Google Scholar] [CrossRef] [PubMed]
- Nori, H.; King, N.; McKinney, S.M.; Carignan, D.; Horvitz, E. Capabilities of GPT-4 on Medical Challenge Problems. arXiv 2023, arXiv:2303.13375. [Google Scholar]
- Brin, D.; Sorin, V.; Vaid, A.; Soroush, A.; Glicksberg, B.S.; Charney, A.W.; Nadkarni, G.; Klang, E. Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments. Sci. Rep. 2023, 13, 16492. [Google Scholar] [CrossRef] [PubMed]
- Davies, K.; Harrison, J. The information-seeking behaviour of doctors: A review of the evidence. Health Info. Libr. J. 2007, 24, 78–94. [Google Scholar] [CrossRef] [PubMed]
- Coumou, H.C.H.; Meijman, F.J. How do primary care physicians seek answers to clinical questions? A literature review. J. Med. Libr. Assoc. 2006, 94, 55–60. [Google Scholar] [PubMed]
- OECD; World Health Organization. Improving Healthcare Quality in Europe: Characteristics, Effectiveness and Implementation of Different Strategies; OECD: Paris, Franch, 2019. [Google Scholar]
- Bacchus, C.M.; Dunfield, L.; Gorber, S.C.; Holmes, N.M.; Birtwhistle, R.; Dickinson, J.A. Canadian Task Force on Preventive Health Care. CMAJ 2016, 188, 340–348. [Google Scholar]
- Cancer Care Ontario. ColonCancerCheck Guidelines. Available online: https://www.cancercareontario.ca (accessed on 15 July 2025).
- NHS England. Bowel Cancer Screening Guide. Available online: https://www.gov.uk (accessed on 15 July 2025).
- Zorzi, M.; Mangone,, L.; Anghinoni, E.; Baracco, S.; Borciani,, E.; Caldarella, A.; Falcini, F.; Fanetti, A.C.; Ferretti, S.; Rossi, P.G.; et al. Characteristics of the colorectal cancers diagnosed in the early 2000s in Italy. Figures from the IMPATTO study on colorectal cancer screening. Epidemiol Prev. 2015; 39, (Suppl. 1), 1–125. [Google Scholar]
Case | Canada | Italy | Romania | UK |
---|---|---|---|---|
Frequency Completely Aligned with Recommendations | 27/54 | 34/54 | 22/54 | 30/54 |
Frequency Partially Aligned with Recommendations | 23/54 | 15/54 | 32/54 | 17/54 |
1 | No | Yes | Partial | Yes |
2 | Yes | Yes | Yes | Yes |
3 | Yes | Yes | Yes | Yes |
4 | Yes | Yes | Yes | Yes |
5 | Partial | Partial | Yes | Yes |
6 | Partial | Yes | Yes | Partial |
7 | Partial | Yes | Yes | Yes |
8 | Partial | Partial | Partial | No |
9 | Partial | Partial | Partial | No |
10 | Partial | Partial | Partial | Yes |
11 | Partial | Yes | Yes | Partial |
12 | Partial | Yes | Yes | Partial |
13 | Partial | Partial | Partial | Partial |
14 | Partial | Partial | Partial | Partial |
15 | Partial | Partial | Partial | Partial |
16 | Yes | Partial | Yes | Yes |
17 | Partial | Yes | Yes | Partial |
18 | No | Partial | Yes | Partial |
19 | Yes | Partial | Partial | Yes |
20 | Yes | Partial | Yes | Yes |
21 | Yes | Partial | Yes | Yes |
22 | Yes | Yes | Yes | Yes |
23 | Yes | Yes | Yes | Partial |
24 | Partial | Yes | Yes | Partial |
25 | Partial | Yes | Yes | Yes |
26 | Partial | Yes | Yes | Yes |
27 | Partial | Yes | Yes | Yes |
28 | Partial | Yes | Yes | Yes |
29 | Yes | Yes | Yes | Partial |
30 | Yes | Yes | Yes | Partial |
31 | Yes | Yes | Partial | Partial |
32 | Partial | Yes | Partial | Partial |
33 | Partial | Yes | Partial | Partial |
34 | No | Yes | Partial | No |
35 | No | Yes | Partial | Partial |
36 | Partial | No | Partial | Yes |
37 | Yes | No | Partial | Yes |
38 | Yes | Yes | Partial | Yes |
39 | Yes | No | Partial | Yes |
40 | Yes | No | Partial | Yes |
41 | Yes | Yes | Partial | Yes |
42 | Yes | Yes | Partial | Yes |
43 | Yes | Yes | Partial | Yes |
44 | Yes | Yes | Partial | Yes |
45 | Yes | Yes | Partial | No |
46 | Yes | Yes | Partial | No |
47 | Yes | Yes | Partial | Yes |
48 | Yes | Yes | Partial | Partial |
49 | Yes | Yes | Partial | No |
50 | Yes | Yes | Partial | Yes |
51 | Yes | Partial | Partial | Yes |
52 | Partial | Partial | Partial | Yes |
53 | Partial | Partial | Partial | Yes |
54 | Partial | No | Partial | No |
Country | Correct (%) | Partially Correct (%) | Incorrect (%) | Commentary |
---|---|---|---|---|
Canada | 50% | 42.6% | 7.4% | Guidelines well-defined; minor inconsistencies observed |
Italy | 63% | 27.7% | 9.3% | More flexible guidelines; chatbot struggles with ambiguity |
Romania | 40.7% | 59.3% | 0% | Lack of formal guidelines; expert judgment required |
UK | 55.6% | 31.5% | 12.9% | Strong national guidelines, but chatbot misinterpreted surveillance intervals |
Case | English (Canada) | Italian | Romanian | English (UK) |
---|---|---|---|---|
Frequency Completely Aligned with Recommendations | 27/54 | 35/54 | 22/54 | 30/54 |
Frequency Partially Aligned with Recommendations | 23/54 | 14/54 | 32/54 | 17/54 |
1 | No | Yes | Partial | Yes |
2 | Yes | Yes | Yes | Yes |
3 | Yes | Yes | Yes | Yes |
4 | Yes | Yes | Yes | Yes |
5 | Partial | Partial | Yes | Yes |
6 | Partial | Yes | Yes | Partial |
7 | Partial | Yes | Yes | Yes |
8 | Partial | Partial | Partial | No |
9 | Partial | Partial | Partial | No |
10 | Partial | Partial | Partial | Yes |
11 | Partial | Yes | Yes | Partial |
12 | Partial | Yes | Yes | Partial |
13 | Partial | Partial | Partial | Partial |
14 | Partial | Partial | Partial | Partial |
15 | Partial | Partial | Partial | Partial |
16 | Yes | Partial | Yes | Yes |
17 | Partial | Yes | Yes | Partial |
18 | No | Partial | Yes | Partial |
19 | Yes | Partial | Partial | Yes |
20 | Yes | Partial | Yes | Yes |
21 | Yes | Partial | Yes | Yes |
22 | Yes | Yes | Yes | Yes |
23 | Yes | Yes | Yes | Partial |
24 | Partial | Yes | Yes | Partial |
25 | Partial | Yes | Yes | Yes |
26 | Partial | Yes | Yes | Yes |
27 | Partial | Yes | Yes | Yes |
28 | Partial | Yes | Yes | Yes |
29 | Yes | Yes | Yes | Partial |
30 | Yes | Yes | Yes | Partial |
31 | Yes | Yes | Partial | Partial |
32 | Partial | Yes | Partial | Partial |
33 | Partial | Yes | Partial | Partial |
34 | No | Yes | Partial | No |
35 | No | Yes | Partial | Partial |
36 | Partial | No | Partial | Yes |
37 | Yes | No | Partial | Yes |
38 | Yes | Yes | Partial | Yes |
39 | Yes | No | Partial | Yes |
40 | Yes | No | Partial | Yes |
41 | Yes | Yes | Partial | Yes |
42 | Yes | Yes | Partial | Yes |
43 | Yes | Yes | Partial | Yes |
44 | Yes | Yes | Partial | Yes |
45 | Yes | Yes | Partial | No |
46 | Yes | Yes | Partial | No |
47 | Yes | Yes | Partial | Yes |
48 | Yes | Yes | Partial | Partial |
49 | Yes | Yes | Partial | No |
50 | Yes | Yes | Partial | Yes |
51 | Yes | Yes | Partial | Yes |
52 | Partial | Partial | Partial | Yes |
53 | Partial | Partial | Partial | Yes |
54 | Partial | No | Partial | No |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zeng, A.; Steinke, J.; Bocse, H.-F.; De Pastena, M. Dr. LLM Will See You Now: The Ability of ChatGPT to Provide Geographically Tailored Colorectal Cancer Screening and Surveillance Recommendations. J. Clin. Med. 2025, 14, 5101. https://doi.org/10.3390/jcm14145101
Zeng A, Steinke J, Bocse H-F, De Pastena M. Dr. LLM Will See You Now: The Ability of ChatGPT to Provide Geographically Tailored Colorectal Cancer Screening and Surveillance Recommendations. Journal of Clinical Medicine. 2025; 14(14):5101. https://doi.org/10.3390/jcm14145101
Chicago/Turabian StyleZeng, Aisling, Jacqueline Steinke, Horea-Florin Bocse, and Matteo De Pastena. 2025. "Dr. LLM Will See You Now: The Ability of ChatGPT to Provide Geographically Tailored Colorectal Cancer Screening and Surveillance Recommendations" Journal of Clinical Medicine 14, no. 14: 5101. https://doi.org/10.3390/jcm14145101
APA StyleZeng, A., Steinke, J., Bocse, H.-F., & De Pastena, M. (2025). Dr. LLM Will See You Now: The Ability of ChatGPT to Provide Geographically Tailored Colorectal Cancer Screening and Surveillance Recommendations. Journal of Clinical Medicine, 14(14), 5101. https://doi.org/10.3390/jcm14145101