Benchmarking ChatGPT and Other Large Language Models for Personalized Stage-Specific Dietary Recommendations in Chronic Kidney Disease
Abstract
1. Introduction
2. Materials and Methods
2.1. Study Design
- AI-Based Diet Recommendation Generation—Obtaining dietary advice from GPT-4, Gemini, and Copilot for patients suffering from CKD at different stages of the disease.
- Professional Evaluation of AI Recommendations—Clinicians compare AI-made recommendations to predetermined review standards.
- Nutritional Comparison and Statistical Analysis—Comparison of nutritional content between AI-recommended diets and comparison of performance among different AI models by statistical analysis.
2.2. Data Collection
2.2.1. AI-Generated Dietary Recommendations
was applied to all models. This approach ensured that they worked on the same task, allowing for a direct comparison of their adherence to clinical guidelines, practicality, and alignment with CKD dietary restrictions. Each model was queried once for each CKD case, generating a full-day meal plan that included breakfast, lunch, and dinner options. Breakfast, lunch, and dinner were evaluated separately for each case by three independent evaluators, and the resulting scores were averaged for analysis.“Provide a culturally appropriate, stage-specific dietary plan (breakfast, lunch, and dinner) for this patient with chronic kidney disease (CKD). Consider dietary restrictions (e.g., sodium, potassium, phosphorus, and protein intake) and incorporate foods commonly consumed in Central Asia.”
2.2.2. Expert Evaluation of AI Recommendations
2.3. Data Analysis
2.3.1. Descriptive Statistics
2.3.2. Comparison of AI Models
2.3.3. Inter-Rater Agreement
- 0.00–0.20 = slight agreement
- 0.21–0.40 = fair agreement
- 0.41–0.60 = moderate agreement
- 0.61–0.80 = substantial agreement
- 0.81–1.00 = almost perfect agreement.
2.3.4. Nutritional Component Analysis
- Nutrient Quantification
- GPT-4’s internal estimations, generated for each provided diet plan. It was chosen for this role based on its demonstrated consistency in structured outputs and its established use in prior studies as an evaluation model [40].
- 2.
- Comparison with Clinical Guidelines
- 3.
- Visualization and Statistical Analysis
2.3.5. Statistical Significance
2.4. Ethical Approval
3. Results
3.1. Descriptive Statistics
3.2. Statistical Analysis
3.3. Inter-Rater Reliability
3.4. Nutritional Component Analysis
3.5. Qualitative Analysis of AI-Generated Meal Plans
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| BMI | Body Mass Index |
| CKD | Chronic Kidney Disease |
| EJCN | European Journal of Clinical Nutrition |
| GPT | Generative Pre-trained Transformer |
| GPT-4 | Generative Pre-trained Transformer, version 4 |
| KDIGO | Kidney Disease: Improving Global Outcomes |
| KDOQI | Kidney Disease Outcomes Quality Initiative |
| LLM | Large Language Model |
| NAFLD | Non-Alcoholic Fatty Liver Disease |
| NCP | Nutrition Care Process |
| SD | Standard Deviation |
| USDA | United States Department of Agriculture |
| Stata MP/18 | Stata Multiprocessor Edition, Version 18 |
Appendix A. Standardized Mock Patient Profiles CKD Stages 1 to 5
- Patient with Stage 1 Chronic Kidney Disease (CKD) and Type 2 Diabetes Mellitus (DM)Name: AizhanGender: FemaleAge: 55Nationality: KazakhstaniLocation: Almaty, KazakhstanFamily Information:Marital Status: MarriedFamily Members: Husband (Nursultan, 58), Son (Arman, 30)Occupation: AccountantCultural Background: Kazakh, fluent in Kazakh and Russian. Follows a traditional Kazakh diet with heavy emphasis on meat (especially lamb and beef), dairy products (like kefir and cottage cheese), and bread. Vegetables are included but less frequently, especially in winter months when fresh produce is harder to obtain.Medical Information: Diagnosis: Type 2 Diabetes. Chronic kidney disease, stage 1. Hypertension.Date of Diagnosis: January 2024Hypertension (diagnosed 8 years ago)Type 2 Diabetes Mellitus (diagnosed 6 years ago)Medical History:Diagnosed approximately 6 years ago after a routine blood test revealed elevated blood sugar levels. Aizhan’s kidney function is monitored through regular blood and urine tests to check for any changes in eGFR or albumin levels, which have remained stable so far. Aizhan’s father had cardiovascular disease and passed away at age 65 from a heart attack, while her mother has been living with type 2 diabetes for over 20 years and has mild kidney dysfunction.Current Medications:Enalapril 10 mg once dailyMetformin 500 mg twice daily (with meals)Aspirin (Low-dose) (for cardiovascular protection) 75 mg once dailyVitamin D (Cholecalciferol) 1000 IU once dailyAtorvastatin 10 mg once dailyDiet History:Breakfast: Green tea, kefir and whole-grain bread with a small amount of butter or cheese.Occasionally, has fried eggs with salad (cucumbers and tomatoes) when she feels hungry.Lunch: Plov (Kazakh rice dish with lamb, onions, and carrots). A side of pickled vegetables (cabbage or cucumbers). Drinks green tea or water with lunch.Dinner: Roast chicken with baked vegetables (like carrots, pumpkins, and bell peppers). Often has fresh bread on the side but tries to limit it. A small portion of fruit, typically an apple or orange.Environmental, Behavioral, and Social Factors: Resides in Almaty, Kazakhstan, a city with easy access to medical care, though it can sometimes be difficult to navigate the public healthcare system. Walks a few times a week, but due to a sedentary job, doesn’t engage in regular physical exercise. Non-smoker, drinks alcohol socially on weekends (typically vodka or wine). Strong family ties; her son Arman and husband Nursultan are supportive, but they have limited knowledge of managing chronic diseases.Assessment:Anthropometry, Body Composition, and Functional:Weight: 185 lbs (84 kg)Height: 5′5” (165 cm)BMI: 31.5 (Obese)Biochemical and Hematological Markers:Serum Creatinine (SCr) 0.77 mg/dL.GFR (Glomerular Filtration Rate): 91 mL/min/1.73 m2 (CKD-EPI)Hemoglobin A1c: 7.5%Sodium: 140 mEq/LPotassium: 4.4 mEq/LAlbumin/Creatinine Ratio: 300 mg/gBlood Pressure: 135/85 mmHgTotal Cholesterol: 220 mg/dLLDL Cholesterol: 130 mg/dLHDL Cholesterol: 50 mg/dLTriglycerides: 160 mg/dLNon-HDL Cholesterol: 170 mg/dLUrine output 1.2 to 2 L per day.Additional Information: Aizhan’s CKD is currently classified as stage 1 (89 mL/min/1.73 m2), with mild proteinuria and stable kidney function. However, her history of diabetes and hypertension put her at an increased risk for progression.Patient with Stage 2 Chronic Kidney DiseaseName: AidaGender: FemaleAge: 53Nationality: KazakhstaniLocation: Almaty, KazakhstanFamily Information:Marital Status: MarriedFamily Members: Husband, no childrenOccupation: CashierCultural Background: Aida is a native Kazakh and has lived her entire life in Kazakhstan. She follows Islamic practices and has a deep respect for her cultural traditions, which include strong ties to community and family. Her diet is influenced by Kazakh cuisine, known for hearty dishes that feature meat, bread, and dairy products.Medical Information: Diagnosis: Chronic kidney disease, stage 2. Concomitant: Hypertension secondary to other renal disorders.Date of Diagnosis: 2 years agoMedical History: Aida’s CKD was discovered during a routine health check-up, which showed an eGFR (estimated Glomerular Filtration Rate) of 60 mL/min/1.73 m2. Her initial symptoms were subtle, including mild fatigue and slight swelling in her ankles, which she attributed to her busy teaching schedule and long working hours.Current Medications: Amlodipine 5 mg daily, atorvastatin 10 mg daily, vitamin D 1000 IU daily. Lisinopril 20 mg daily.Diet History:Breakfast: A cup of unsweetened green tea for antioxidants and low-fat greek yogurt.Lunch: Tuna sandwich on whole-wheat bread with a side of vegetable sticksDinner: Pasta with marinara sauce and a side of steamed vegetablesEnvironmental, Behavioral, and Social Factors: Cultural eating patterns, such as traditional Kazakh dishes rich in meats and dairy, can impact Aida’s CKD management. Adapting her diet to include more plant-based foods, whole grains, and lean proteins while limiting high-potassium and high-phosphorus foods is key.Assessment:Anthropometry, Body Composition, and Functional:Weight: Current—65 kgHeight: 1.67 mBMI: 23.3 kg/m2Biochemical and Hematological Markers: eGFR: 67 mL/min/1.73 m2 (Stage 2 CKD), serum creatinine: 1.0 mg/dL, slightly elevated serum creatinine levels and increased proteinuria. Cholesterol: Total 210 mg/dL, LDL 120 mg/dL (elevated).Blood Pressure: 130/85 mmHgAdditional Information: Aida’s CKD Stage 2 is stable, with her main focus being on preventing progression to more advanced stages. She maintains a regular follow-up with her nephrologist and primary care physician to monitor her kidney function and adjust her treatment as needed. She has been advised to limit her protein intake, avoid high-sodium and processed foods, and manage her fluid intake to prevent further kidney strain.Patient with Stage 3 Chronic Kidney DiseaseName: DariyaGender: FemaleAge: 42Nationality: KazakhstaniLocation: Kostanay, Kostanay region, KazakhstanFamily Information:Marital Status: MarriedFamily Members: Husband, mother of two children (ages 25 and 19)Occupation: High School TeacherCultural Background: Dariya is Christian, specifically from the Russian Orthodox tradition, her religious beliefs and practices can play a significant role in her lifestyle and health choices.Medical Information: Diagnosis: Chronic kidney disease, stage 3a. Concomitant: Hypertension secondary to other renal disorders. Complication: Anemia of chronic disease. Obesity.Date of Diagnosis: 2 years agoMedical History: Diagnosed approximately 2 years ago during a routine checkup. Her kidney function is monitored through regular blood and urine tests. Dariya’s father had cardiovascular disease, while her mother dealt with type 2 diabetes.Current Medications: Lisinopril 10–20 mg once daily. Folic Acid 400 mcg daily. Atorvastatin 20 mg once daily. Iron Supplements, Calcium Carbonate or Calcium Acetate, Vitamin DDiet History:Breakfast: Scrambled Eggs with Spinach and a Side of Whole Wheat ToastLunch: Lettuce Wraps with grilled chicken or turkey slices, shredded carrots, and quinoa salad with cucumbers.Dinner: Vegetable Stir-Fry with tofu, snow peas, carrots, and a small serving of brown rice. Drinks like ginger tea or chamomile tea can be soothing and aid digestion.Environmental, Behavioral, and Social Factors: Dariya’s husband, Mikhail, works as a civil engineer in Kostanay and plays an essential role in supporting her health journey. He helps with meal preparations and encourages her to stick to her CKD management plan. Her children, Dmitry (25) and Elena (19), also assist with daily tasks and make sure she maintains a healthy lifestyle. They share in traditional meals and participate in outdoor activities that benefit her physical well-being.Assessment:Anthropometry, Body Composition, and Functional:Weight: Current—83 kgHeight: 1.60 mBMI: 32.4 kg/m2Biochemical and Hematological Markers: eGFR: 48 mL/min/1.73 m2. (Stage 3a CKD), serum creatinine: 1.4 mg/dL, BUN: 20–30 mg/d, proteinuria, mild anemia, phosphate (slightly elevated), secondary hyperparathyroidismBlood Pressure: 125/80 mmHgAdditional Information: Dariya was diagnosed with CKD stage 3a 2 years ago after a routine check-up revealed reduced kidney function. Her estimated glomerular filtration rate (eGFR) was between 45–59 mL/min/1.73 m2, indicating moderate kidney impairment. The diagnosis was a turning point, as it was the first time she recognized that she needed to make significant changes to her lifestyle to manage her health. Dariya prioritizes physical activity to maintain a healthy weight and manage her blood pressure. She enjoys morning walks around parks, as well as light exercises at home.Patient with Stage 4 Chronic Kidney DiseaseName: NurlanGender: MaleAge: 47Nationality: KazakhstaniLocation: Shymkent, KazakhstanFamily Information:Marital Status: MarriedFamily Members: Wife, a teenage daughter.Occupation: Disabled person of the 2nd groupCultural Background: Family gatherings, celebrations, and religious observances are central to Nurlan’s life. He values traditional Kazakh dishes and often participates in events that involve sharing meals with relatives and friends. However, this love for traditional food presents a challenge for managing his CKD.Medical Information: Diagnosis: Bilateral hydronephrosis. Complicated pyelonephritis associated with chronic renal stones disease. Ureteral stent. Chronic kidney disease, stage 4. Ormond’s disease. Concomitant: Type 2 Diabetes. Hypertension secondary to other renal disorders. Complication: Anemia of other chronic diseases.Date of Diagnosis: Diagnosed last year.Medical History: Nurlan has a history of UTI, occasional episodes, likely due to reduced kidney function and diabetes, with antibiotic treatment as needed. He was hospitalized in November last year with postrenal anuria, obstructive syndrome and acute pyelonephritis, a stent was installed in the right kidney for bilateral hydronephrosis.Current Medications: Fosinopril (20 mg daily), Metformin (500 mg twice daily), Simvastatin (10 mg nightly), Iron, Calcium and vitamin D supplements.Diet History:Breakfast: Cooked oats with a bit of unsalted chicken broth and fresh cucumber and tomato salad. Green tea with no added sugar.Lunch: Kazakh-Style Grilled Chicken Skewers served with a small side of steamed white rice.Afternoon Snack: Low-Sodium Hummus (homemade or store-bought) with sliced bell peppers and carrot sticks for dipping.Dinner: Steamed Broccoli or Cauliflower and fresh herb salad.Environmental, Behavioral, and Social Factors: Nurlan’s family is an essential part of his life, providing emotional support and helping him manage his health.Assessment:Anthropometry, Body Composition, and Functional:Weight: Current—90 kgHeight: 1.81 mBMI: 27.5 kg/m2Biochemical and Hematological Markers: creatinine 250.00 mmol/L (increased), elevated eGFR: 27 mL/min/1.73 m2, hemoglobin level 97 g/L (anemia), hyperlipidemia, hypoglycemia (reduced to normal range after treatment)Blood Pressure: 135/85 mmHgAdditional Information: According to health assessment that included a comprehensive blood panel and urine test: significantly elevated levels of creatinine and a decrease in his glomerular filtration rate (GFR) to around 30%, indicating that his kidney function had declined to stage 4. The urine test revealed that Nurlan had proteinuria (protein in the urine), a common sign of kidney damage. The nephrologist confirmed the diagnosis of CKD stage 4 after evaluating his medical history, blood tests, and imaging studies. The findings indicated that Nurlan’s kidney function was at 25–30% of normal, with damage likely exacerbated by his preexisting hypertension and type 2 diabetes.Patient with Stage 5 Chronic Kidney Disease: End-stage kidney disease (EKSD) caused by glomerular disease. Concomitant: Hypertension secondary to other renal disorders. Complication: Anemia in other chronic diseases.Name: AyanGender: MaleAge: 35Nationality: KazakhstaniLocation: Astana, KazakhstanFamily Information:Marital Status: SingleFamily Members: -Occupation: Disabled person of the 1st groupCultural Background: Ayan finds comfort in prayer and the teachings of Islam. Reciting the Qur’an and participating in community prayers help him maintain a sense of peace and resilience in the face of his illness. Does not consume pork in his dietMedical Information:Diagnosis: Chronic kidney disease, STAGE 5. Terminal chronic renal failure in the outcome of glomerular kidney disease. Concomitant: Hypertension secondary to other renal disorders. Complication: Anemia in other chronic diseases.Date of Diagnosis: 3 years agoMedical History: Ayan has a history of chronic glomerulonephritis.Current Medications: Hemodialysis sessions 3 times a week for 4 h. Fosinopril 40 mg daily. Amlodipine 5 mg daily. Vitamin D-3 5000 IU. Epoetin beta 2000 IU/0.3 mL p/k p/d + Iron (III) hydroxide sucrose complex 5 mL I/V slowly. Iron, Calcium supplementsDiet History:Breakfast: Oatmeal made with water, green tea (low in caffeine). Egg whites (2–3) scrambled with a few fresh herbs (parsley, dill) for flavor.Lunch: Grilled chicken breast (small portion, about 3–4 oz), seasoned with lemon juice, black pepper, and herbs. Steamed or boiled white rice with a side of steamed zucchini. A small serving of low-sodium vegetable soup made with carrots, cabbage, and a bit of dill for flavor.Afternoon Snack: Low-sodium cottage cheese (1/2 cup) with a handful of blueberries or sliced strawberries. Herbal tea (chamomile or peppermint).Dinner: Baked or grilled fish, seasoned with herbs and a dash of olive oil. Mashed potatoes made with skinless potatoes and a little bit of unsalted butter.Environmental, Behavioral, and Social Factors: The social stigma of a serious chronic illness like CKD may make Ayan feel isolated or reluctant to share his difficulties, though he tries to stay positive and maintain a balanced lifestyle.Assessment:Anthropometry, Body Composition, and Functional:Weight: Current—75 kgHeight: 1.71 mBMI: 25.7 kg/m2Biochemical and Hematological Markers: creatinine 445.00 mmol/L (increased), elevated eGFR: 14 mL/min/1.73 m2, hemoglobin level 82 g/L, PTH level is 90.2 pg/mL. Phosphorus: 1.13–1.78Blood Pressure: 130/80 mmHgAdditional Information: The disease debuted in March 2022, did not receive treatment, did not asked for help. In August 2023, he began to notice a loss of appetite, nausea and vomiting began to bother him. 21 August 2023 called an ambulance, due to the deterioration of my condition: blood pressure 200/100 mmHg, not reduced by drugs. In this regard, he was taken to the hospital. Taking into account edematous syndrome, shortness of breath, hyperhydration, uremic intoxication, critical indicators of azotemia, anemia, the patient was urgently hospitalized for an emergency hemodialysis session. Since then, he has been receiving hemodialysis courses with repeated hospitalization for inpatient treatment.
Appendix B. Extended Data Tables and Figures for Methodological Transparency
| Score | Personalization | Consistency | Practicality and Availability |
|---|---|---|---|
| 1 | Not applicable for evaluation | Not applicable for evaluation | Not applicable for evaluation |
| 2 | Poor personalization, addressing a few individual factors like general dietary preferences or habits but without comprehensive tailoring. | Poor consistency, with some recommendations adhering to evidence-based guidelines, but still containing potentially problematic advice or conflicting information. | Poor practicality, with suggestions that may be achievable in some regions or situations, but still contain less accessible or hard-to-find ingredients or foods. |
| 3 | Moderate personalization, addressing most individual factors and incorporating them into the recommendations, with improvements possible | Moderate consistency, with a reasonable balance between evidence-based advice and individual tailoring, though improvements can be made to align more closely with established recommendations. | Moderate practicality, with an effort to consider regional availability and ease of implementation, but still with room for improvement to cater to individual context. |
| 4 | Good personalization, taking into account a wide range of individual factors such as age, medical history, cultural background, and preferences, with only minor improvements needed. | High consistency, with the majority of recommendations adhering to evidence-based guidelines while demonstrating adaptability to individual needs, with only minor refinements required | Good practicality, with recommendations based on easily obtainable ingredients or foods in the individual’s region, considering cultural habits and familiar meals, with minor refinements needed. |
| 5 | Excellent personalization, thoroughly addressing all relevant individual factors, resulting in highly tailored recommendations that cater to specific needs. | Excellent consistency, with all provided recommendations being fully in line with evidence-based guidelines, ensuring safety, efficacy, and tailoring for the specific individual’s needs. | Excellent practicality, with recommendations seamlessly fitting into the individual’s life, ensuring adaptability to their cultural context and basing the suggestions on readily available ingredients or foods |
| Study | Evaluation Method | Description |
|---|---|---|
| Ponzo et al. [23], Nutrients | Likert: Appropriateness, Completeness, Consistency | Evaluated ChatGPT responses against KDIGO and KDOQI guidelines. Responses categorized as “appropriate,” “inappropriate,” “not supported,” “not fully matched,” or “general advice.” |
| Ponzo et al. [46], JCM | Likert: Accuracy (6-point), Completeness (3-point), Appropriateness, Comprehensibility (3-point) | Used different Likert scales for evaluation, focusing on CKD dietary recommendations. |
| Kim et al. [33], Frontiers in Nutrition | Likert (0–10): Effectiveness, Balancedness, Comprehensiveness, Flexibility, Applicability, Overall Impression | Evaluated AI-generated vs. control diet plans with professionals in obesity medicine. Additional metrics included personalized diet plan effectiveness, safety, applicability, and likelihood of use. Free-text feedback was also collected. |
| Naja et al. [59], EJCN | Likert (1–4): Concordance with guidelines, Clarity, Coherence, Practicality | Dietitians evaluated AI chatbot responses for dietary management, nutrition care process (NCP), and menu planning, assessing accuracy and adherence to guidelines. Cohen’s kappa used for inter-rater reliability. |
| Pugliese et al. [50], Clinical Gastroenterology & Hepatology | Likert: Accuracy, Completeness, Comprehensibility | 10 experts in NAFLD and 1 patient advocate rated AI responses using Likert scales, analyzed with descriptive statistics and concordance measures. |
| Johnson et al. [52], Research Square | Likert (6-point): Accuracy; Likert (3-point): Completeness, Comprehensibility | Physicians from multiple specialties evaluated ChatGPT responses to medical questions based on clarity, completeness, and adherence to guidelines. |
| CKD Stages | Copilot | Gemini | ChatGPT-4 |
|---|---|---|---|
| 1 | The diet suggested by Copilot is overall well-structured and aligns with patients’ needs during Stage1 of CKD. The generated meal plan provides moderate level of protein where lean protein sources are emphasized. In terms of sodium, avoidance of too much salt and its replacement with herbs and spices are suggested as alternatives which makes the sodium content of the meal plan appropriate for CKD Stage 1 patients. Only concern is pickled vegetables, as they might contain higher amount of sodium. Both potassium and phosphorus levels of the suggested meal plan do not present a risk, as high potassium fruits such as banana and avocado were suggested to avoid. Overall, the plan is appropriate for Stage 1 CKD if portions are controlled and labs are regularly monitored. | The diet suggested by Gemini is well-structured and closely aligns with patient needs as Stage 1 does not present severe restrictions. Protein amount in suggested meal plan is moderate, spread across the meals and protein sources such as lean cuts of chicken, fish and lamb, egg whites, diary products, plant proteins from lentils and nuts were suggested. Sodium content seems to present low risk, as across meal plan, salt was suggested to be replaced with alternatives or with low sodium foods. Potassium and Phosphorus were not restricted, but diet should be monitored as some of the suggested food items might have higher levels of potassium (tomatoes, cucumbers, pumpkin, carrots) and phosphorus (diary products, nuts). | The diet suggested by ChatGPT-4 is well balanced and closely aligns with Stage 1 CKD. Suggested plan provides controlled levels of protein at around 0.8 g per kilogram per day, focusing on the leanest sources and appropriate portions. For Sodium, the risk seems to be minimal—it stays under 2300 milligrams daily, and cooking techniques like boiling meat before stewing to cut down on salt were suggested. For potassium, instead of typical high-K starches, the meal plan uses lower-potassium bases like buckwheat, pumpkin, and berries. For phosphorus is controlled by incorporating grains like buckwheat and quinoa. |
| 2 | The diet suggested by Copilot is overall well-structured. The generated meal plan suggests protein sources from moderate sources like fish and lean meat, but it includes Greek Yogurt, which might be high in both protein and phosphorus. Sodium risk is moderate because the plan relies on potentially high-sodium items like canned tuna (even low-sodium varieties) and commercial whole-grain pasta/bread. Potassium is at a moderate risk level due to the inclusion of foods like tomatoes and whole-grain pasta. While it introduces the idea of low-potassium fruits, Greek yogurt and whole might lead to increased levels of potassium and phosphorus. | The diet suggested by Gemini is well-structured and provides appropriate restriction of nutrients. Lean meat and egg whites were suggested as protein sources, while sodium levels were controlled by restricting processed and canned foods. Phosphorus was actively controlled by recommending low-phosphorus milk alternatives and limiting high-phosphorus dairy. However, meal plan includes a small, controlled portion of baked potato, which is considered high-potassium food. | The diet suggested by ChatGPT-4 is well balanced aligns with Stage 2 CKD. Protein restriction is clear and appropriate, restricting it to 0.8 g per kilogram per day. Sodium levels are controlled, with target under 2000 milligrams daily and emphasizing low-sodium yogurt sauces. For both potassium and phosphorus, the plan implements active limitations, strategically using safer, low potassium and low phosphorus grains like quinoa and buckwheat instead of traditional whole-wheat products |
| 4 | The diet suggested by Copilot is overall well-structured, with a focus on fresh foods, limited salt and protein intake. Sodium intake is appropriately low due to avoidance of processed foods and salt- based food, which supports kidney protections. The possible concern is potassium and phosphorus level, as in this meal plan high potassium foods are present (tomatoes, nuts, broccoli, carrots), but based on the meal plan the amount is stated reduced, bit not fully clear. Overall, the plan is appropriate for Stage 4 CKD if portions are controlled and labs are regularly monitored. | The diet suggested by Gemini is well-structured and closely aligns with patient needs. Protein restriction is emphasized with a focus on high quality sources such as egg whites, small portions of chicken and fish. Sodium restriction is also strong, with avoidance of processed foods. Phosphorus and potassium are appropriately addressed with limits on dairy, nuts, whole grains, and high potassium fruits and vegetables. Items like spinach, berries, tofu should be used cautiously. The plan also considers coexisting diabetes and anemia, ensuring carbohydrate control and iron support. Overall, the plan is appropriate for Stage 4 CKD, but close monitoring of labs is needed. | The diet suggested by ChatGPT-4 is well balanced and closely aligns with Stage 4 CKD with coexisting type 2 diabetes. Protein restriction is clear and appropriate, with high quality sources (egg, chicken, fish) in limited portions. Sodium control is strong, with target under 1500 mg/d. Potassium and phosphorus are managed effectively by avoiding high-potassium and high phosphorus foods. Fruits are limited to low potassium options. The plan also considers coexisting diabetes and anemia by adding low glycemic index grains and iron rich foods. Overall, this diet is appropriate for Stage 4 CKD with diabetes with a clear portion sizes and considering as well cultural preferences and alternatives. |
| 5 | This plan shows nutrient balance, cultural adaptation and key priorities. Protein intake is appropriately increased, with lean sources like chicken, fish and egg whites. Sodium restriction is noted, but ideally it should be <2000 mg/day, with strict avoidance of processed foods, canned foods and added salt. Potassium and phosphorus manaGeminient needs closer adjustment. Some foods (potatoes, broccoli, nuts, cottage cheese, yogurt) are high in potassium or phosphorus and need portion control. Spinach should be avoided due to high potassium/phosphorus. Overall, the diet is appropriate for CKD Stage 5 patients on dialysis but would be safer with stricter sodium limits and careful adjustment of potassium and phosphorus sources. | The plan is well structured and correctly increases high quality protein for dialysis patients. Sodium and fluid control must be strict, < 2000 mg sodium/day. Phosphorus control remains a concern, as items like cottage cheese, nuts, yogurt are included but may raise phosphorus. Potassium manaGeminient is addressed with low potassium fruits and vegetables, but strict portion control is needed to stay safe. Overall, suggested plan is appropriate for a dialysis patient but needs tighter control on potassium and phosphorus. | This plan is well prepared for a dialysis patient: protein intake increased, potassium and phosphorus control are addressed with substitutions (avoiding tomato, turnips instead potatoes). Overall, this diet is appropriate, but close monitoring of phosphorus (hummus, dairy, flatbread) and potassium (fruit portions) remains essential. |
| CKD Stage | Recommendations | Sources |
|---|---|---|
| 1–2 | - Protein Intake: 0.8–1.0 g/kg/day. - Sodium Intake: Limit to less than 2300 mg/day. - Potassium and Phosphorus: Generally unrestricted unless serum levels are elevated. - Calories: Ensure adequate caloric intake to support energy needs. - Dietary Advice: Promote a balanced diet rich in fruits, vegetables, and whole grains. | - KDOQI Guidelines (2020) [41] - National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) [42] |
| 3–5 (Non-Dialysis) | - Protein Intake: 0.6–0.8 g/kg/day (some guidelines recommend as low as 0.55–0.60 g/kg/day). - Sodium Intake: Limit to less than 2300 mg/day. - Potassium: Adjust intake based on individual lab results; restrict if hyperkalemia is present. - Phosphorus: Limit to 800–1000 mg/day; avoid phosphorus additives. - Calories: 30–35 kcal/kg/day to maintain energy balance. - Fluid Intake: Adjust based on medical status. | - KDOQI Guidelines (2020) [41] - ESPEN Guidelines [43] - European Renal Best Practice (ERBP) [60] - UK Kidney Association [61] |
| 4–5 (Pre-Dialysis) | - Protein Intake: 0.6–0.8 g/kg/day (some guidelines recommend up to 1.0 g/kg/day). - Sodium Intake: Limit as per blood pressure and fluid status. - Potassium and Phosphorus: Restrict intake; monitor serum levels closely. - Calories: 30–35 kcal/kg/day. - Fluid Restriction: Implement if edema develops. - Micronutrients: Ensure adequate intake of vitamins and minerals; supplement as needed. | - Clinical Guideline of the Republic of Kyrgyzstan - KDOQI Guidelines (2020) [41] - NICE Guideline [NG203] (2021) [44] |
| Dialysis | - Protein Intake: - Hemodialysis: 1.1–1.4 g/kg/day (some guidelines recommend up to 1.5 g/kg/day). - Peritoneal Dialysis: 1.0–1.2 g/kg/day. - Calories: 30–40 kcal/kg/day depending on age and physical activity. - Sodium, Potassium, Phosphorus: Monitor and adjust intake as necessary. - Fluid Intake: Adjust based on urine output and fluid gains during dialysis. - Micronutrients: Supplement water-soluble vitamins. | - ESPEN Guidelines [43] - Clinical Guideline of the Republic of Kyrgyzstan - UK Kidney Association (UKKA) [61] - National Kidney Foundation (NKF) [2] |
| General Recommendations | - Limit Salt Intake - Control Protein Intake - Heart-Healthy Diet - Limit Phosphorus and Potassium: - Avoid Certain Foods | - Cleveland Clinic [62] -Physicians Committee for Responsible Medicine [63] - National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) [42] - Clinical Guideline of the Ministry of Health of Kazakhstan |


| Personalization | Consistency | Practicality | ||||
|---|---|---|---|---|---|---|
| Comparison | Cliff’s Delta (δ) | 95%CIs | Cliff’s Delta (δ) | 95%CIs | Cliff’s Delta (δ) | 95%CIs |
| GPT–Gemini | 0.20 | [0.04; 0.35] | 0.18 | [−0.01; 0.34] | 0.20 | [0.02; 0.37] |
| GPT–Copilot | −0.27 | [−0.45; −0.06] | −0.24 | [−0.43; −0.03] | 0 | [−0.20; 0.20] |
| Gemini–Copilot | −0.47 | [−0.62; −0.28] | −0.42 | [−0.59; −0.22] | −0.20 | [−0.37; −0.02] |
| Evaluation Criterion | Krippendorff’s Alpha | Interpretation |
|---|---|---|
| Personalization | 0.22 | Fair Agreement |
| Consistency | 0.11 | Slight Agreement |
| Practicality | −0.2 | Poor/No Agreement |
| Nutrient | Mean Absolute % Error | 95% CI (Lower–Upper) |
|---|---|---|
| Protein | 28.22% | −9.59–66.03% |
| Sodium | 52.01% | 15.14–88.89% |
| Potassium | 53.67% | 24.31–83.02% |
| Phosphorus | 31.86% | 14.16–49.56% |



| CKD Stages | AI Model | Suggested Food | Issue |
|---|---|---|---|
| 1 | Copilot | Brown rice | Rarely consumed in the local diet and not part of routine food culture |
| Gemini | Brown rice | Rarely consumed in the local diet and not part of routine food culture | |
| ChatGPT-4 | Brown rice | Rarely consumed in the local diet and not part of routine food culture | |
| 2 | Copilot | Grilled fish like salmon or trout, marinara sauce, almond milk | Rarely consumed in the local diet and not part of routine food culture; limited availability and higher cost |
| Gemini | Low-phosphorus milk alternative, low-phosphorus cream cheese, brown rice, unsalted rice cakes, | Rarely consumed in the local diet and not part of routine food culture; limited availability and higher cost | |
| ChatGPT-4 | Almond milk, yogurt-based garlic sauce | Rarely consumed in the local diet and not part of routine food culture; limited availability and higher cost | |
| 3 | Copilot | Quinoa, tofu, brown rice, | Rarely consumed in the local diet and not part of routine food culture; limited availability and higher cost |
| Gemini | Low-phosphorus milk alternative, quinoa, tofu, brown rice, unsalted rice cakes | Rarely consumed in the local diet and not part of routine food culture; limited availability and higher cost | |
| ChatGPT-4 | Quinoa | Rarely consumed in the local diet and not part of routine food culture | |
| 4 | Copilot | Hummus | Rarely consumed in the local diet and not part of routine food culture |
| Gemini | Tofu, rice cakes (unsalted) | Rarely consumed in the local diet and not part of routine food culture | |
| ChatGPT-4 | Barley tea, low-sodium hummus, cod or tilapia | Rarely consumed in the local diet and not part of routine food culture; limited availability | |
| 5 | Copilot | Peppermint tea | Rarely consumed in the local diet and not part of routine food culture |
| Gemini | Low phosphorus milk alternative, rice cakes (unsalted) | Rarely consumed in the local diet and not part of routine food culture; limited availability and higher cost | |
| ChatGPT-4 | Hummus, cod or tilapia | Rarely consumed in the local diet and not part of routine food culture, higher cost |
References
- Francis, A.; Harhay, M.N.; Ong, A.C.M.; Tummalapalli, S.L.; Ortiz, A.; Fogo, A.B.; Fliser, D.; Roy-Chaudhury, P.; Fontana, M.; Nangaku, M.; et al. Chronic kidney disease and the global public health agenda: An international consensus. Nat. Rev. Nephrol. 2024, 20, 473–485. [Google Scholar] [CrossRef] [PubMed]
- Chronic Kidney Disease (CKD)—Symptoms, Causes, Treatment|National Kidney Foundation. Available online: https://www.kidney.org/kidney-topics/chronic-kidney-disease-ckd (accessed on 30 June 2025).
- Webster, A.C.; Nagler, E.V.; Morton, R.L.; Masson, P. Chronic Kidney Disease. Lancet 2017, 389, 1238–1252. [Google Scholar] [CrossRef]
- Ammirati, A.L. Chronic kidney disease. Rev. Assoc. Med. Bras. 2020, 66, 3–9. [Google Scholar] [CrossRef]
- Wang, H.; Naghavi, M.; Allen, C.; Barber, R.M.; Bhutta, Z.A.; Carter, A.; Casey, D.C.; Charlson, F.J.; Chen, A.Z.; Coates, M.M.; et al. Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980–2015: A systematic analysis for the Global Burden of Disease Study 2015. Lancet 2016, 388, 1459–1544. [Google Scholar] [CrossRef]
- Bikbov, B.; Purcell, C.A.; Levey, A.S.; Smith, M.; Abdoli, A.; Abebe, M.; Adebayo, O.M.; Afarideh, M.; Agarwal, S.K.; Agudelo-Botero, M.; et al. Global, regional, and national burden of chronic kidney disease, 1990–2017: A systematic analysis for the Global Burden of Disease Study 2017. Lancet 2020, 395, 709–733. [Google Scholar] [CrossRef]
- Kramer, H. Diet and Chronic Kidney Disease. Adv. Nutr. 2019, 10, S367–S379. [Google Scholar] [CrossRef]
- Ko, G.J.; Kalantar-Zadeh, K. How important is dietary management in chronic kidney disease progression? A role for low protein diets. Korean J. Intern. Med. 2021, 36, 795–806. [Google Scholar] [CrossRef]
- Chen, W.; Abramowitz, M.K. Advances in management of chronic metabolic acidosis in chronic kidney disease. Curr. Opin. Nephrol. Hypertens. 2019, 28, 409–416. [Google Scholar] [CrossRef]
- Pesta, D.H.; Samuel, V.T. A high-protein diet for reducing body fat: Mechanisms and possible caveats. Nutr. Metab. 2014, 11, 53. [Google Scholar] [CrossRef] [PubMed]
- Palmer, S.C.; Maggo, J.K.; Campbell, K.L.; Craig, J.C.; Johnson, D.W.; Sutanto, B.; Ruospo, M.; Tong, A.; Strippoli, G.F. Dietary interventions for adults with chronic kidney disease. Cochrane Database Syst. Rev. 2017, 2017, CD011998. [Google Scholar] [CrossRef] [PubMed]
- Anderson, C.A.M.; Nguyen, H.A. Nutrition education in the care of patients with chronic kidney disease and end--stage renal disease. Semin. Dial. 2018, 31, 115–121. [Google Scholar] [CrossRef]
- Maleki Varnosfaderani, S.; Forouzanfar, M. The Role of AI in Hospitals and Clinics: Transforming Healthcare in the 21st Century. Bioengineering 2024, 11, 337. [Google Scholar] [CrossRef]
- Borenstein, J.; Wagner, A.R.; Howard, A. Overtrust of Pediatric Health-Care Robots: A Preliminary Survey of Parent Perspectives. IEEE Robot. Autom. Mag. 2018, 25, 46–54. [Google Scholar] [CrossRef]
- Theodore Armand, T.P.; Nfor, K.A.; Kim, J.I.; Kim, H.C. Applications of Artificial Intelligence, Machine Learning, and Deep Learning in Nutrition: A Systematic Review. Nutrients 2024, 16, 1073. [Google Scholar] [CrossRef] [PubMed]
- Garcia, M.B. ChatGPT as a Virtual Dietitian: Exploring Its Potential as a Tool for Improving Nutrition Knowledge. Appl. Syst. Innov. 2023, 6, 96. [Google Scholar] [CrossRef]
- Papastratis, I.; Stergioulas, A.; Konstantinidis, D.; Daras, P.; Dimitropoulos, K. Can ChatGPT provide appropriate meal plans for NCD patients? Nutrition 2024, 121, 112291. [Google Scholar] [CrossRef]
- Wang, L.C.; Zhang, H.; Ginsberg, N.; Nandorine Ban, A.; Kooman, J.P.; Kotanko, P. Application of ChatGPT to Support Nutritional Recommendations for Dialysis Patients—A Qualitative and Quantitative Evaluation. J. Ren. Nutr. 2024, 34, 477–481. [Google Scholar] [CrossRef]
- Yaseen, I.; Rather, R. A Theoretical Exploration of Artificial Intelligence’s Impact on Feto-Maternal Health from Conception to Delivery. Int. J. Womens Health 2024, 16, 903–915. [Google Scholar] [CrossRef]
- Côté, M.; Lamarche, B. Artificial intelligence in nutrition research: Perspectives on current and future applications. Appl. Physiol. Nutr. Metab. 2022, 47, 1–8. [Google Scholar] [CrossRef]
- Bergling, K.; Wang, L.C.; Shivakumar, O.; Ban, A.N.; Moore, L.W.; Ginsberg, N.; Kooman, J.; Duncan, N.; Kotanko, P.; Zhang, H.; et al. From bytes to bites: Application of large language models to enhance nutritional recommendations. Clin. Kidney J. 2025, 18, sfaf082. [Google Scholar] [CrossRef]
- Papastratis, I.; Konstantinidis, D.; Daras, P.; Dimitropoulos, K. AI nutrition recommendation using a deep generative model and ChatGPT. Sci. Rep. 2024, 14, 14620. [Google Scholar] [CrossRef] [PubMed]
- Ponzo, V.; Goitre, I.; Favaro, E.; Merlo, F.D.; Mancino, M.V.; Riso, S.; Bo, S. Is ChatGPT an Effective Tool for Providing Dietary Advice? Nutrients 2024, 16, 469. [Google Scholar] [CrossRef]
- Lo, F.P.W.; Qiu, J.; Wang, Z.; Chen, J.; Xiao, B.; Yuan, W.; Giannarou, S.; Frost, G.; Lo, B. Dietary Assessment with Multimodal ChatGPT: A Systematic Analysis. IEEE J. Biomed. Health Inform. 2024, 28, 7577–7587. [Google Scholar] [CrossRef]
- Qarajeh, A.; Tangpanithandee, S.; Thongprayoon, C.; Suppadungsuk, S.; Krisanapan, P.; Aiumtrakul, N.; Valencia, O.A.G.; Miao, J.; Qureshi, F.; Cheungpasitporn, W. AI-Powered Renal Diet Support: Performance of ChatGPT, Bard AI, and Bing Chat. Clin. Pract. 2023, 13, 1160–1172. [Google Scholar] [CrossRef]
- Varayil, J.E.; Bielinski, S.J.; Mundi, M.S.; Bonnes, S.L.; Salonen, B.R.; Hurt, R.T. Artificial intelligence in clinical nutrition: Bridging data analytics and nutritional care. Curr. Nutr. Rep. 2025, 14, 91. [Google Scholar] [CrossRef]
- Limketkai, B.N.; Mauldin, K.; Manitius, N.; Jalilian, L.; Salonen, B.R. The Age of Artificial Intelligence: Use of Digital Technology in Clinical Nutrition. Curr. Surg. Rep. 2021, 9, 20. [Google Scholar] [CrossRef]
- Gençer Bingöl, F.; Ağagündüz, D.; Bingol, M.C. Accuracy of Current Large Language Models and the Retrieval-Augmented Generation Model in Determining Dietary Principles in Chronic Kidney Disease. J. Ren. Nutr. 2025, 35, 401–409. [Google Scholar] [CrossRef]
- Adilmetova, G.; Nassyrov, R.; Meyerbekova, A.; Karabay, A.; Varol, H.A.; Chan, M.Y. Evaluating ChatGPT’s Multilingual Performance in Clinical Nutrition Advice Using Synthetic Medical Text: Insights from Central Asia. J. Nutr. 2025, 155, 729–735. [Google Scholar] [CrossRef] [PubMed]
- Auyeskhan, U.; Azhbagambetov, A.; Sadykov, T.; Dairabayeva, D.; Talamona, D.; Chan, M.Y. Reducing meat consumption in Central Asia through 3D printing of plant-based protein—Enhanced alternatives—A mini review. Front. Nutr. 2024, 10, 1308836. [Google Scholar] [CrossRef] [PubMed]
- Carrero, J.J.; González-Ortiz, A.; Avesani, C.M.; Bakker, S.J.L.; Bellizzi, V.; Chauveau, P.; Clase, C.M.; Cupisti, A.; Espinosa-Cuevas, A.; Molina, P.; et al. Plant-based diets to manage the risks and complications of chronic kidney disease. Nat. Rev. Nephrol. 2020, 16, 525–542. [Google Scholar] [CrossRef]
- Su, G.; Qin, X.; Yang, C.; Sabatino, A.; Kelly, J.T.; Avesani, C.M.; Carrero, J.J. Fiber intake and health in people with chronic kidney disease. Clin. Kidney J. 2022, 15, 213–225. [Google Scholar] [CrossRef]
- Kim, D.W.; Park, J.S.; Sharma, K.; Velazquez, A.; Li, L.; Ostrominski, J.W.; Tran, T.; Peréz, R.H.S.; Shin, J.-H. Qualitative evaluation of artificial intelligence-generated weight management diet plans. Front. Nutr. 2024, 11, 1374834. [Google Scholar] [CrossRef]
- Meissel, K.; Yao, E.S. Using Cliff’s Delta as a Non-Parametric Effect Size Measure: An Accessible Web App and R Tutorial. Pract. Assess. Res. Eval. 2024, 29, 2. [Google Scholar] [CrossRef]
- Landis, J.R.; Koch, G.G. The Measurement of Observer Agreement for Categorical Data. Biometrics 1977, 33, 159. [Google Scholar] [CrossRef]
- Sharma, P.; McCullough, K.; Scotland, G.; McNamee, P.; Prescott, G.; MacLeod, A.; Fluck, N.; Smith, W.C.; Black, C. Does stage-3 chronic kidney disease matter?: A systematic literature review. Br. J. General. Pract. 2010, 60, e266–e276. [Google Scholar] [CrossRef]
- USDA FoodData Central. Available online: https://fdc.nal.usda.gov/ (accessed on 14 May 2025).
- Nutrient Data—Food Analyzer|DaVita Kidney Care. Available online: https://www.davita.com/diet-nutrition/food-analyzer (accessed on 14 May 2025).
- Nutrition and Kidney Disease, Stages 1-5 (Not on Dialysis)|National Kidney Foundation. Available online: https://www.kidney.org/kidney-topics/nutrition-and-kidney-disease-stages-1-5-not-dialysis (accessed on 14 May 2025).
- Szymanski, A.; Ziems, N.; Eicher-Miller, H.A.; Li, T.J.J.; Jiang, M.; Metoyer, R.A. Limitations of the LLM-as-a-Judge Approach for Evaluating LLM Outputs in Expert Knowledge Tasks. In Proceedings of the IUI ’25: Proceedings of the 30th International Conference on Intelligent User Interfaces, Cagliari, Italy, 24–27 March 2025; Volume 15, pp. 952–966. [Google Scholar] [CrossRef]
- Ikizler, T.A.; Burrowes, J.D.; Byham-Gray, L.D.; Campbell, K.L.; Carrero, J.J.; Chan, W.; Fouque, D.; Friedman, A.N.; Ghaddar, S.; Goldstein-Fuchs, D.J.; et al. KDOQI Clinical Practice Guideline for Nutrition in CKD: 2020 Update. Am. J. Kidney Dis. 2020, 76, S1–S107. [Google Scholar] [CrossRef]
- National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). Available online: https://www.niddk.nih.gov/ (accessed on 18 September 2025).
- Fiaccadori, E.; Sabatino, A.; Barazzoni, R.; Carrero, J.J.; Cupisti, A.; De Waele, E.; Jonckheer, J.; Singer, P.; Cuerda, C. ESPEN Guideline ESPEN guideline on clinical nutrition in hospitalized patients with acute or chronic kidney disease. Clin. Nutr. 2021, 40, 1644–1668. [Google Scholar] [CrossRef] [PubMed]
- Overview|Chronic Kidney Disease: Assessment and Management|Guidance|NICE. Available online: https://www.nice.org.uk/guidance/ng203 (accessed on 10 June 2025).
- Sarnowski, A.; Gama, R.M.; Dawson, A.; Mason, H.; Banerjee, D. Hyperkalemia in Chronic Kidney Disease: Links, Risks and Management. Int. J. Nephrol. Renov. Dis. 2022, 15, 215. [Google Scholar] [CrossRef] [PubMed]
- Ponzo, V.; Rosato, R.; Scigliano, M.C.; Onida, M.; Cossai, S.; De Vecchi, M.; Devecchi, A.; Goitre, I.; Favaro, E.; Merlo, F.D.; et al. Comparison of the Accuracy, Completeness, Reproducibility, and Consistency of Different AI Chatbots in Providing Nutritional Advice: An Exploratory Study. J. Clin. Med. 2024, 13, 7810. [Google Scholar] [CrossRef]
- Wang, L.; Chen, X.; Deng, X.; Wen, H.; You, M.; Liu, W.; Li, Q.; Li, J. Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs Check for updates. npj Digit. Med. 2024, 7, 41. [Google Scholar] [CrossRef]
- Azimi, I.; Qi, M.; Wang, L.; Rahmani, A.M.; Li, Y. Evaluation of LLMs accuracy and consistency in the registered dietitian exam through prompt engineering and knowledge retrieval. Npj Digit. Med. 2024, 7, 41. [Google Scholar] [CrossRef]
- Razavi, A.; Soltangheis, M.; Arabzadeh, N.; Salamat, S.; Zihayat, M.; Bagheri, E. Benchmarking Prompt Sensitivity in Large Language Models. In European Conference on Information Retrieval; Springer Nature: Cham, Switzerland, 2025; pp. 303–313. [Google Scholar]
- Pugliese, N.; Wai-Sun Wong, V.; Schattenberg, J.M.; Romero-Gomez, M.; Sebastiani, G.; Aghemo, A.; NAFLD Expert Chatbot Working Group. Accuracy, Reliability, and Comprehensibility of ChatGPT-Generated Medical Responses for Patients With Nonalcoholic Fatty Liver Disease. Clin. Gastroenterol. Hepatol. 2024, 22, 886–889.e5. [Google Scholar] [CrossRef]
- Niszczota, P.; Rybicka, I. The credibility of dietary advice formulated by ChatGPT: Robo-diets for people with food allergies. Nutrition 2023, 112, 112076. [Google Scholar] [CrossRef]
- Johnson, D.; Goodman, R.; Patrinely, J.; Stone, C.; Zimmerman, E.; Donald, R.; Chang, S.; Berkowitz, S.; Finn, A.; Jahangir, E. Assessing the Accuracy and Reliability of AI-Generated Medical Responses: An Evaluation of the Chat-GPT Model. Res. Sq. 2023, preprint. [Google Scholar] [CrossRef]
- Bragazzi, N.L.; Monica, S.; Bergenti, F.; Scazzina, F.; Rosi, A. Comparative Analysis of AI Systems and Human Nutrition Knowledge: Evaluating ChatGPT and Other AI Systems Against Dietetics Students and the General Population. J. Med. Internet Res. 2024, preprint. [Google Scholar] [CrossRef]
- Hieronimus, B.; Hammann, S.; Podszun, M.C. Can the AI tools ChatGPT and Bard generate energy, macro- and micro-nutrient sufficient meal plans for different dietary patterns? Nutr. Res. 2024, 128, 105–114. [Google Scholar] [CrossRef] [PubMed]
- Cersosimo, A.; Zito, E.; Pierucci, N.; Matteucci, A.; La Fazia, V.M. A Talk with ChatGPT: The Role of Artificial Intelligence in Shaping the Future of Cardiology and Electrophysiology. J. Pers. Med. 2025, 15, 205. [Google Scholar] [CrossRef] [PubMed]
- Sblendorio, E.; Dentamaro, V.; Lo Cascio, A.; Germini, F.; Piredda, M.; Cicolini, G. Integrating human expertise & automated methods for a dynamic and multi-parametric evaluation of large language models’ feasibility in clinical decision-making. Int. J. Med. Inform. 2024, 188, 105501. [Google Scholar] [CrossRef]
- Parozzi, M.; Bozzetti, M.; Lo Cascio, A.; Napolitano, D.; Pendoni, R.; Marcomini, I.; Sblendorio, E.; Cangelosi, G.; Mancin, S.; Bonacaro, A. Semantic Evaluation of Nursing Assessment Scales Translations by ChatGPT 4.0: A Lexicometric Analysis. Nurs. Rep. 2025, 15, 211. [Google Scholar] [CrossRef]
- Ali, H.; Sumon, R.I.; Khalid, A.R.; Fathima, K.; Kim, H.C. A Semantic Evaluation Framework for Medical Report Generation Using Large Language Models. Comput. Mater. Contin. 2025, 84, 5445–5462. [Google Scholar] [CrossRef]
- Naja, F.; Taktouk, M.; Matbouli, D.; Khaleel, S.; Maher, A.; Uzun, B.; Alameddine, M.; Nasreddine, L. Artificial intelligence chatbots for the nutrition management of diabetes and the metabolic syndrome. Eur. J. Clin. Nutr. 2024, 78, 887–896. [Google Scholar] [CrossRef] [PubMed]
- ERBP—European Renal Best Practice|ERA. Available online: https://www.era-online.org/publications/erbp-european-renal-best-practice/ (accessed on 18 September 2025).
- UK Kidney Association|The Leading Professional Body for the UK Kidney Community. Available online: https://www.ukkidney.org/ (accessed on 18 September 2025).
- Chronic Kidney Disease (CKD): Symptoms & Treatment. Available online: https://my.clevelandclinic.org/health/diseases/15096-chronic-kidney-disease (accessed on 18 September 2025).
- Physicians Committee for Responsible Medicine. Available online: https://www.pcrm.org/ (accessed on 18 September 2025).


| AI Model | Criterion | Median (IQR) | Mean ± SD | Min | Max | n |
|---|---|---|---|---|---|---|
| ChatGPT-4 | Consistency | 4 (1) | 3.67 ± 0.48 | 3.00 | 4.00 | 45 |
| Practicality | 4 (1) | 3.67 ± 0.48 | 3.00 | 4.00 | 45 | |
| Personalization | 4 (0) | 3.71 ± 0.46 | 3.00 | 4.00 | 45 | |
| Gemini | Consistency | 4 (0) | 3.84 ± 0.37 | 3.00 | 4.00 | 45 |
| Practicality | 4 (0) | 3.87 ± 0.34 | 3.00 | 4.00 | 45 | |
| Personalization | 4 (0) | 3.91 ± 0.29 | 3.00 | 4.00 | 45 | |
| Copilot | Consistency | 3 (1) | 3.42 ± 0.50 | 3.00 | 4.00 | 45 |
| Practicality | 4 (1) | 3.67 ± 0.48 | 3.00 | 4.00 | 45 | |
| Personalization | 3 (0) | 3.44 ± 0.50 | 3.00 | 4.00 | 45 |
| Criterion | AI Model | Rank Sum | χ2 (df = 2) | p-Value |
|---|---|---|---|---|
| Consistency | ChatGPT-4 | 3127.5 | 17.52 | 0.0002 * |
| Gemini | 3667.5 | |||
| Copilot | 2385 | |||
| Practicality | ChatGPT-4 | 2857.5 | 6.091 | 0.0476 * |
| Gemini | 3465 | |||
| Copilot | 2857.5 | |||
| Personalization | ChatGPT-4 | 3127.5 | 22.848 | 0.0001 * |
| Gemini | 3735 | |||
| Copilot | 2317.5 |
| Comparison | Personalization | Consistency | Practicality | |||
|---|---|---|---|---|---|---|
| z-Test | p-Value | z-Test | p-Value | z-Test | p-Value | |
| GPT-4–Gemini | 2.0416 | 0.0618 | 1.7551 | 0.1189 | 2.1373 | 0.0489 * |
| GPT-4–Copilot | −2.7222 | 0.0097 * | −2.4133 | 0.0237 * | 0 | 1 |
| Gemini–Copilot | −4.7638 | 0.0001 * | −4.1684 | 0.0001 * | −2.1373 | 0.0489 * |
| Diet | Protein (g) | Sodium (mg) | Potassium (mg) | Phosphorus (mg) | ||||
|---|---|---|---|---|---|---|---|---|
| Manual | ChatGPT-4 | Manual | ChatGPT-4 | Manual | ChatGPT-4 | Manual | ChatGPT-4 | |
| Initial | 95.4 ↑ | 89 ↑ | 1314 | 440 | 1541 | 2470 ↑ | 1051 ↑ | 1269 ↑ |
| ChatGPT-4 | 54 | 83 ↑ | 731 | 433 | 1373 | 2194 | 770 | 1046 ↑ |
| Gemini | 102.1 ↑ | 58 | 1212 | 296 | 2604 | 1912 | 680 | 990 |
| Copilot | 95.9 ↑ | 87 ↑ | 1326 | 1660 | 1756 | 2950 ↑ | 1060 ↑ | 1328 ↑ |
| Guideline [2,41,42,43,44] | 58 | 2300 (limit) | 2400 (limit) | 1000 (limit) | ||||
| Protein | Sodium | Potassium | Phosphorus | |||||
|---|---|---|---|---|---|---|---|---|
| Diet | Absolute Error | % Error | Absolute Error | % Error | Absolute Error | % Error | Absolute Error | % Error |
| Initial | 6.40 | −6,71% | 874 | −66.51% | 929 | 60.29% | 218 | 20.74% |
| ChatGPT-4 | 29 | 53.70% | 298 | −40.77% | 821 | 59.80% | 276 | 35.84% |
| Gemini | 44.10 | −43.19% | 916 | −75.58% | 692 | −26.57% | 310 | 45.59% |
| Copilot | 8.90 | −9.28% | 334 | 25.19% | 1194 | 68% | 268 | 25.28% |
| Nutrient | ChatGPT-4 | Manual | Mean Difference (GPT4−Manual) | Max Abs Difference | ||
|---|---|---|---|---|---|---|
| Mean | SD | Mean | SD | |||
| Protein (g) | 79.25 | 14.38 | 86.85 | 22.11 | −7.60 | 44.1 |
| Sodium (mg) | 707.25 | 638.62 | 1145.75 | 281.19 | −438.50 | 916 |
| Potassium (mg) | 2381.50 | 442.2 | 1818.50 | 546.62 | 563 | 1194.00 |
| Phosphorus (mg) | 1158.25 | 165.32 | 890.38 | 194.17 | 267.88 | 309.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kairat, M.; Adilmetova, G.; Ibraimova, I.; Gaipov, A.; Varol, H.A.; Chan, M.-Y. Benchmarking ChatGPT and Other Large Language Models for Personalized Stage-Specific Dietary Recommendations in Chronic Kidney Disease. J. Clin. Med. 2025, 14, 8033. https://doi.org/10.3390/jcm14228033
Kairat M, Adilmetova G, Ibraimova I, Gaipov A, Varol HA, Chan M-Y. Benchmarking ChatGPT and Other Large Language Models for Personalized Stage-Specific Dietary Recommendations in Chronic Kidney Disease. Journal of Clinical Medicine. 2025; 14(22):8033. https://doi.org/10.3390/jcm14228033
Chicago/Turabian StyleKairat, Makpal, Gulnoza Adilmetova, Ilvira Ibraimova, Abduzhappar Gaipov, Huseyin Atakan Varol, and Mei-Yen Chan. 2025. "Benchmarking ChatGPT and Other Large Language Models for Personalized Stage-Specific Dietary Recommendations in Chronic Kidney Disease" Journal of Clinical Medicine 14, no. 22: 8033. https://doi.org/10.3390/jcm14228033
APA StyleKairat, M., Adilmetova, G., Ibraimova, I., Gaipov, A., Varol, H. A., & Chan, M.-Y. (2025). Benchmarking ChatGPT and Other Large Language Models for Personalized Stage-Specific Dietary Recommendations in Chronic Kidney Disease. Journal of Clinical Medicine, 14(22), 8033. https://doi.org/10.3390/jcm14228033

