Previous Article in Journal
Distinct Gut Microbiome Signatures in Hemodialysis and Kidney Transplant Populations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Benchmarking ChatGPT and Other Large Language Models for Personalized Stage-Specific Dietary Recommendations in Chronic Kidney Disease

by
Makpal Kairat
1,†,
Gulnoza Adilmetova
1,†,
Ilvira Ibraimova
2,
Abduzhappar Gaipov
2,
Huseyin Atakan Varol
3 and
Mei-Yen Chan
1,*
1
Department of Biomedical Sciences, School of Medicine, Nazarbayev University, Astana 010000, Kazakhstan
2
Department of Medicine, School of Medicine, Nazarbayev University, Astana 010000, Kazakhstan
3
Institute of Smart Systems and Artificial Intelligence, Nazarbayev University, Astana 010000, Kazakhstan
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
J. Clin. Med. 2025, 14(22), 8033; https://doi.org/10.3390/jcm14228033 (registering DOI)
Submission received: 4 September 2025 / Revised: 28 October 2025 / Accepted: 6 November 2025 / Published: 12 November 2025
(This article belongs to the Section Clinical Nutrition & Dietetics)

Abstract

Background: Chronic kidney disease (CKD) requires strict dietary management tailored to disease stage and individual needs. Recent advances in artificial intelligence (AI) have introduced chatbot-based tools capable of generating dietary recommendations. However, their accuracy, personalization, and practical applicability in clinical nutrition remain largely unvalidated, particularly in non-Western settings. Methods: Simulated patient profiles representing each CKD stage were developed and used to prompt GPT-4 (OpenAI), Gemini (Google), and Copilot (Microsoft) with the same request for meal planning. AI-generated diets were evaluated by three physicians using a 5-point Likert scale across three criteria: personalization, consistency with guidelines, practicality, and availability. Descriptive statistics, Kruskal–Wallis tests, and Dunn’s post hoc tests were performed to compare model performance. Nutritional analysis of four meal plans (Initial, GPT-4, Gemini, and Copilot) was conducted using both GPT-4 estimates and manual calculations validated against clinical dietary sources. Results: Scores for personalization and consistency were significantly higher for Gemini and GPT-4 compared with Copilot, with no significant differences between Gemini and GPT-4 (p = 0.0001 and p = 0.0002, respectively). Practicality showed marginal significance, with GPT-4 slightly outperforming Gemini (p = 0.0476). Nutritional component analysis revealed discrepancies between GPT-4’s internal estimations and manual values, with occasional deviations from clinical guidelines, most notably for sodium and potassium, and moderate overestimation for phosphorus. Conclusions: While AI chatbots show promise in delivering dietary guidance for CKD patients, with Gemini demonstrating the strongest performance, further development, clinical validation, and testing with real patient data are needed before AI-driven tools can be fully integrated into patient-centered CKD nutritional care.
Keywords: renal health; dialysis; Artificial Intelligence (AI); ChatGPT; LLM; AI-assisted; dietary guidance renal health; dialysis; Artificial Intelligence (AI); ChatGPT; LLM; AI-assisted; dietary guidance

Share and Cite

MDPI and ACS Style

Kairat, M.; Adilmetova, G.; Ibraimova, I.; Gaipov, A.; Varol, H.A.; Chan, M.-Y. Benchmarking ChatGPT and Other Large Language Models for Personalized Stage-Specific Dietary Recommendations in Chronic Kidney Disease. J. Clin. Med. 2025, 14, 8033. https://doi.org/10.3390/jcm14228033

AMA Style

Kairat M, Adilmetova G, Ibraimova I, Gaipov A, Varol HA, Chan M-Y. Benchmarking ChatGPT and Other Large Language Models for Personalized Stage-Specific Dietary Recommendations in Chronic Kidney Disease. Journal of Clinical Medicine. 2025; 14(22):8033. https://doi.org/10.3390/jcm14228033

Chicago/Turabian Style

Kairat, Makpal, Gulnoza Adilmetova, Ilvira Ibraimova, Abduzhappar Gaipov, Huseyin Atakan Varol, and Mei-Yen Chan. 2025. "Benchmarking ChatGPT and Other Large Language Models for Personalized Stage-Specific Dietary Recommendations in Chronic Kidney Disease" Journal of Clinical Medicine 14, no. 22: 8033. https://doi.org/10.3390/jcm14228033

APA Style

Kairat, M., Adilmetova, G., Ibraimova, I., Gaipov, A., Varol, H. A., & Chan, M.-Y. (2025). Benchmarking ChatGPT and Other Large Language Models for Personalized Stage-Specific Dietary Recommendations in Chronic Kidney Disease. Journal of Clinical Medicine, 14(22), 8033. https://doi.org/10.3390/jcm14228033

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop