Comparative Analysis of ChatGPT and Gemini in Addressing Questions from Chronic Kidney Disease Patients
Abstract
1. Introduction
2. Methods
2.1. Study Design and Participants
2.2. Data Collection
- Medical condition and treatment (n = 73);
- Nutrition and diet (n = 35);
- Symptom management (n = 15).
2.3. Large Language Model Evaluation
2.4. LLM Scoring According to Nephrologists
2.5. Statistical Analysis
3. Results
3.1. QAMAI Scores
3.2. Effect Size Analysis
3.3. Inter-Rater Reliability
3.4. Consensus Analysis
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Bikbov, B.; Purcell, C.; Levey, A.; Smith, M.; Abdoli, A.; Abebe, M.; Adebayo, O.M.; Afarideh, M.; Agarwal, S.K.; Agudelo-Botero, M.; et al. Global, regional, and national burden of chronic kidney disease, 1990–2017: A systematic analysis for the Global Burden of Disease Study 2017. Lancet 2020, 395, 709–733. [Google Scholar] [CrossRef]
- Foreman, K.J.; Marquez, N.; Dolgert, A.; Fukutaki, K.; Fullman, N.; McGaughey, M.; Pletcher, M.A.; Smith, A.E.; Tang, K.; Yuan, C.-W.; et al. Forecasting life expectancy, years of life lost, and all-cause and cause-specific mortality for 250 causes of death: Reference and alternative scenarios for 2016–40 for 195 countries and territories. Lancet 2018, 392, 2052–2090. [Google Scholar] [CrossRef]
- Besarab, A.; Levin, A. Defining a renal anemia management period. Am. J. Kidney Dis. 2000, 36, S13–S23. [Google Scholar] [CrossRef]
- Moe, S.; Drüeke, T.; Cunningham, J.; Goodman, W.; Martin, K.; Olgaard, K.; Ott, S.; Sprague, S.; Lameire, N.; Eknoyan, G. Definition, evaluation, and classification of renal osteodystrophy: A position statement from Kidney Disease: Improving Global Outcomes (KDIGO). Kidney Int. 2006, 69, 1945–1953. [Google Scholar] [CrossRef] [PubMed]
- Lee, G.H.; Benner, D.; Regidor, D.L.; Kalantar-Zadeh, K. Impact of kidney bone disease and its management on survival of patients on dialysis. J. Ren. Nutr. 2007, 17, 38–44. [Google Scholar] [CrossRef]
- Foley, R.N.; Parfrey, P.S.; Sarnak, M.J. Clinical epidemiology of cardiovascular disease in chronic renal disease. Am. J. Kidney Dis. 1998, 32, S112–S119. [Google Scholar] [CrossRef]
- Bibbins-Domingo, K.; Chertow, G.M.; Fried, L.F.; Odden, M.C.; Newman, A.B.; Kritchevsky, S.B.; Harris, T.B.; Satterfield, S.; Cummings, S.R.; Shlipak, M.G. Renal function and heart failure risk in older black and white individuals: The Health, Aging, and Body Composition Study. Arch. Intern. Med. 2006, 166, 1396–1402. [Google Scholar] [CrossRef]
- Billany, R.E.; Thopte, A.; Adenwalla, S.F.; March, D.S.; Burton, J.O.; Graham-Brown, M.P. Associations of health literacy with self-management behaviours and health outcomes in chronic kidney disease: A systematic review. J. Nephrol. 2023, 36, 1267–1281. [Google Scholar] [CrossRef] [PubMed]
- Taylor, D.M.; Fraser, S.D.; Bradley, J.A.; Bradley, C.; Draper, H.; Metcalfe, W.; Oniscu, G.C.; Tomson, C.R.; Ravanan, R.; Roderick, P.J.; et al. A systematic review of the prevalence and associations of limited health literacy in CKD. Clin. J. Am. Soc. Nephrol. 2017, 12, 1070–1084. [Google Scholar] [CrossRef]
- Miao, J.; Thongprayoon, C.; Kashani, K.B.; Cheungpasitporn, W. Artificial intelligence as a tool for improving health literacy in kidney care. PLoS Digit. Health 2025, 4, e0000746. [Google Scholar] [CrossRef] [PubMed]
- Jin, Q.; Leaman, R.; Lu, Z. Retrieve, summarize, and verify: How will ChatGPT affect information seeking from the medical literature? J. Am. Soc. Nephrol. 2023, 34, 1302–1304. [Google Scholar] [CrossRef] [PubMed]
- Yuan, Q.; Zhang, H.; Deng, T.; Tang, S.; Yuan, X.; Tang, W.; Xie, Y.; Ge, H.; Wang, X.; Zhou, Q.; et al. Role of artificial intelligence in kidney disease. Int. J. Med. Sci. 2020, 17, 970. [Google Scholar] [CrossRef]
- Dave, T.; Athaluri, S.A.; Singh, S. ChatGPT in medicine: An overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front. Artif. Intell. 2023, 6, 1169595. [Google Scholar] [CrossRef] [PubMed]
- Gravel, J.; D’Amours-Gravel, M.; Osmanlliu, E. Learning to fake it: Limited responses and fabricated references provided by ChatGPT for medical questions. Mayo Clin. Proc. Digit. Health 2023, 1, 226–234. [Google Scholar] [CrossRef]
- Valencia, O.A.G.; Thongprayoon, C.; Miao, J.; Suppadungsuk, S.; Krisanapan, P.; Craici, I.M.; Jadlowiec, C.C.; Mao, S.A.; Mao, M.A.; Leeaphorn, N.; et al. Empowering inclusivity: Improving readability of living kidney donation information with ChatGPT. Front. Digit. Health 2024, 6, 1366967. [Google Scholar] [CrossRef] [PubMed]
- Vaira, L.A.; Lechien, J.R.; Abbate, V.; Allevi, F.; Audino, G.; Beltramini, G.A.; Bergonzani, M.; Boscolo-Rizzo, P.; Califano, G.; Cammaroto, G.; et al. Validation of the Quality Analysis of Medical Artificial Intelligence (QAMAI) tool: A new tool to assess the quality of health information provided by AI platforms. Eur. Arch. Oto-Rhino-Laryngol. 2024, 281, 6123–6131. [Google Scholar] [CrossRef]
- Naz, R.; Akacı, O.; Erdoğan, H.; Açıkgöz, A. Can large language models provide accurate and quality information to parents regarding chronic kidney diseases? J. Eval. Clin. Pract. 2024, 30, 1556–1564. [Google Scholar] [CrossRef]
- Unger, Z.; Soffer, S.; Efros, O.; Chan, L.; Klang, E.; Nadkarni, G.N. Clinical applications and limitations of large language models in nephrology: A systematic review. Clin. Kidney J. 2025, 18, sfaf243. [Google Scholar] [CrossRef]
- Yoon, S.-H.; Oh, S.K.; Lim, B.G.; Lee, H.-J. Performance of ChatGPT in the In-Training Examination for Anesthesiology and Pain Medicine Residents in South Korea: Observational Study. JMIR Med. Educ. 2024, 10, e56859. [Google Scholar] [CrossRef]
- Fang, C.; Wu, Y.; Fu, W.; Ling, J.; Wang, Y.; Liu, X.; Jiang, Y.; Wu, Y.; Chen, Y.; Zhou, J.; et al. How does ChatGPT-4 preform on non-English national medical licensing examination? An evaluation in Chinese language. PLoS Digit. Health 2023, 2, e0000397. [Google Scholar] [CrossRef]
- Ozturk, N.; Yakak, I.; Ağ, M.B.; Aksoy, N. Is ChatGPT reliable and accurate in answering pharmacotherapy-related inquiries in both Turkish and English? Curr. Pharm. Teach. Learn. 2024, 16, 102101. [Google Scholar] [CrossRef]
- OpenAI. GPT-4o Model Card and Technical Overview; OpenAI: San Francisco, CA, UAS, 2024; Available online: https://platform.openai.com/docs/models (accessed on 10 January 2026).
- Google. Gemini Model Overview and Technical Documentation; Google DeepMind: Mountain View, CA, USA, 2024; Available online: https://ai.google.dev/gemini-api/docs/models (accessed on 10 January 2026).
- AlSammarraie, A.; Househ, M. The use of large language models in generating patient education materials: A scoping review. Acta Inform. Med. 2025, 33, 4. [Google Scholar] [CrossRef]
- Pham, J.H.; Thongprayoon, C.; Miao, J.; Suppadungsuk, S.; Koirala, P.; Craici, I.M.; Cheungpasitporn, W. Large language model triaging of simulated nephrology patient inbox messages. Front. Artif. Intell. 2024, 7, 1452469. [Google Scholar] [CrossRef]
- Rossettini, G.; Rodeghiero, L.; Corradi, F.; Cook, C.; Pillastrini, P.; Turolla, A.; Castellini, G.; Chiappinotto, S.; Gianola, S.; Palese, A. Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: A cross-sectional study. BMC Med. Educ. 2024, 24, 694. [Google Scholar] [CrossRef]
- Bahir, D.; Zur, O.; Attal, L.; Nujeidat, Z.; Knaanie, A.; Pikkel, J.; Mimouni, M.; Plopsky, G. Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge. Graefe’s Arch. Clin. Exp. Ophthalmol. 2025, 263, 527–536. [Google Scholar] [CrossRef] [PubMed]
- Acharya, P.C.; Alba, R.; Krisanapan, P.; Acharya, C.M.; Suppadungsuk, S.; Csongradi, E.; Mao, M.A.; Craici, I.M.; Miao, J.; Thongprayoon, C.; et al. AI-driven patient education in chronic kidney disease: Evaluating chatbot responses against clinical guidelines. Diseases 2024, 12, 185. [Google Scholar] [CrossRef]
- Hancı, V.; Ergün, B.; Gül, Ş.; Uzun, Ö.; Erdemir, İ.; Hancı, F.B. Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care. Medicine 2024, 103, e39305. [Google Scholar] [CrossRef]
- Athaluri, S.A.; Manthena, S.V.; Kesapragada, V.K.M.; Yarlagadda, V.; Dave, T.; Duddumpudi, R.T.S. Exploring the boundaries of reality: Investigating the phenomenon of artificial intelligence hallucination in scientific writing through ChatGPT references. Cureus 2023, 15, e37432. [Google Scholar] [CrossRef] [PubMed]
- Bhattacharyya, M.; Miller, V.M.; Bhattacharyya, D.; Miller, L.E.; Miller, V. High rates of fabricated and inaccurate references in ChatGPT-generated medical content. Cureus 2023, 15, e39238. [Google Scholar] [CrossRef] [PubMed]
- Chelli, M.; Descamps, J.; Lavoué, V.; Trojani, C.; Azar, M.; Deckert, M.; Raynier, J.-L.; Clowez, G.; Boileau, P.; Ruetsch-Chelli, C. Hallucination rates and reference accuracy of ChatGPT and bard for systematic reviews: Comparative analysis. J. Med. Internet Res. 2024, 26, e53164. [Google Scholar] [CrossRef]
- Harigai, A.; Toyama, Y.; Nagano, M.; Abe, M.; Kawabata, M.; Li, L.; Yamamura, J.; Takase, K. Response accuracy of GPT-4 across languages: Insights from an expert-level diagnostic radiology examination in Japan. Jpn. J. Radiol. 2025, 43, 319–329. [Google Scholar] [CrossRef] [PubMed]
- Samaan, J.S.; Yeo, Y.H.; Ng, W.H.; Ting, P.-S.; Trivedi, H.; Vipani, A.; Yang, J.D.; Liran, O.; Spiegel, B.; Kuo, A.; et al. ChatGPT’s ability to comprehend and answer cirrhosis related questions in Arabic. Arab J. Gastroenterol. 2023, 24, 145–148. [Google Scholar] [CrossRef] [PubMed]



| Dimension | ChatGPT (Mean ± SD) | Gemini (Mean ± SD) | p-Value |
|---|---|---|---|
| Accuracy | 4.44 ± 0.53 | 4.55 ± 0.53 | 0.095 |
| Clarity | 4.78 ± 0.32 | 4.50 ± 0.56 | <0.001 |
| Relevance | 4.66 ± 0.43 | 4.58 ± 0.51 | 0.124 |
| Completeness | 4.28 ± 0.57 | 4.23 ± 0.70 | 0.634 |
| Usefulness | 4.51 ± 0.49 | 4.36 ± 0.63 | 0.009 |
| Sources & References | 1.00 | 1.00 | NA |
| QAMAI Total Score | 23.68 ± 1.78 (18–26) | 23.21 ± 2.42 (14.5–26) | 0.106 |
| Dimension | Group 1 | Group 2 | Group 3 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| ChatGPT | Gemini | p Value | ChatGPT | Gemini | p Value | ChatGPT | Gemini | p Value | |
| Accuracy | 4.36 ± 0.58 | 4.35 ± 0.57 | 0.781 | 4.51 ± 0.43 | 4.80 ± 0.27 | 0.001 | 4.73 ± 0.32 | 4.93 ± 0.26 | 0.058 |
| Clarity | 4.72 ± 0.35 | 4.32 ± 0.59 | <0.001 | 4.88 ± 0.25 | 4.68 ± 0.41 | 0.009 | 4.93 ± 0.18 | 4.97 ± 0.13 | 0.564 |
| Relevance | 4.54 ± 0.47 | 4.36 ± 0.53 | 0.019 | 4.88 ± 0.24 | 4.88 ± 0.25 | 1.000 | 4.73 ± 0.37 | 4.97 ± 0.13 | 0.038 |
| Completeness | 4.24 ± 0.62 | 3.93 ± 0.69 | 0.001 | 4.29 ± 0.49 | 4.58 ± 0.42 | 0.004 | 4.43 ± 0.42 | 4.93 ± 0.18 | 0.002 |
| Sources & References | 1.00 | 1.00 | NA | 1.00 | 1.00 | NA | 1.00 | 1.00 | NA |
| Usefulness | 4.45 ± 0.52 | 4.11 ± 0.62 | 0.001 | 4.69 ± 0.47 | 4.68 ± 0.51 | 0.847 | 4.40 ± 0.51 | 4.87 ± 0.23 | 0.004 |
| QAMAI Total Score | 23.31 ± 2.00 | 22.07 ± 2.35 | <0.001 | 24.26 ± 1.18 | 24.63 ± 1.26 | 0.145 | 24.23 ± 1.25 | 25.67 ± 0.68 | 0.001 |
| Group 1 | Group 2 | Group 3 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Dimension | ChatGPT | Gemini | Cohen’s d | ChatGPT | Gemini | Cohen’s d | ChatGPT | Gemini | Cohen’s d |
| Accuracy | 4.36 ± 0.58 | 4.35 ± 0.57 | 0.014 | 4.51 ± 0.43 | 4.80 ± 0.27 | 0.768 | 4.73 ± 0.32 | 4.93 ± 0.26 | 0.702 |
| Clarity | 4.72 ± 0.35 | 4.32 ± 0.59 | 0.806 | 4.88 ± 0.25 | 4.68 ± 0.41 | 0.572 | 4.93 ± 0.18 | 4.97 ± 0.13 | 0.270 |
| Relevance | 4.54 ± 0.47 | 4.36 ± 0.53 | 0.380 | 4.88 ± 0.24 | 4.88 ± 0.25 | 0.000 | 4.73 ± 0.37 | 4.97 ± 0.13 | 0.938 |
| Completeness | 4.24 ± 0.62 | 3.93 ± 0.69 | 0.462 | 4.29 ± 0.49 | 4.58 ± 0.42 | 0.651 | 4.43 ± 0.42 | 4.93 ± 0.18 | 1.329 |
| Usefulness | 4.45 ± 0.52 | 4.11 ± 0.62 | 0.607 | 4.69 ± 0.47 | 4.68 ± 0.51 | 0.018 | 4.40 ± 0.51 | 4.87 ± 0.23 | 1.142 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Bati Sutcu, Y.; Ozcan, S.G.; Dincer, M.T.; Atli, Z.; Trabulus, S.; Seyahi, N. Comparative Analysis of ChatGPT and Gemini in Addressing Questions from Chronic Kidney Disease Patients. Kidney Dial. 2026, 6, 9. https://doi.org/10.3390/kidneydial6010009
Bati Sutcu Y, Ozcan SG, Dincer MT, Atli Z, Trabulus S, Seyahi N. Comparative Analysis of ChatGPT and Gemini in Addressing Questions from Chronic Kidney Disease Patients. Kidney and Dialysis. 2026; 6(1):9. https://doi.org/10.3390/kidneydial6010009
Chicago/Turabian StyleBati Sutcu, Yasemin, Seyda Gul Ozcan, Mevlut Tamer Dincer, Zeynep Atli, Sinan Trabulus, and Nurhan Seyahi. 2026. "Comparative Analysis of ChatGPT and Gemini in Addressing Questions from Chronic Kidney Disease Patients" Kidney and Dialysis 6, no. 1: 9. https://doi.org/10.3390/kidneydial6010009
APA StyleBati Sutcu, Y., Ozcan, S. G., Dincer, M. T., Atli, Z., Trabulus, S., & Seyahi, N. (2026). Comparative Analysis of ChatGPT and Gemini in Addressing Questions from Chronic Kidney Disease Patients. Kidney and Dialysis, 6(1), 9. https://doi.org/10.3390/kidneydial6010009

