ChatGPT-4o and 4o1 Preview as Dietary Support Tools in a Real-World Medicated Obesity Program: A Prospective Comparative Analysis
Abstract
:1. Introduction
1.1. Study Context
1.2. Related Studies
1.3. Study Aims
2. Materials and Methods
2.1. Study Design
2.2. Statistical Analysis
3. Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
DWLS | Digital Weight Loss Service |
LLM | Large Language Model |
GLP-1 RAs | Glucagon-Like Peptide-1 Receptor Agonists |
ANOVA | Analysis of Variance |
Appendix A. Prompt, Instructions, and Description of Mock Patient for the LLM
- LLM Prompt
- Instructions
- Patient profile
Appendix B. Mock Patient Questions
Broad Questions | |
Question | Theme |
| Weight-loss plateau |
| Appetite |
| Protein intake |
| Calories |
| Time constraints |
Narrow Questions | |
| Intermittent fasting |
| Supplementation |
| Carbohydrates |
| Weight tracking |
| Artificial sweeteners |
Appendix C. Scoring Matrix
Scoring Criterion | Description |
Scientific correctness | How accurately each answer reflects the current state of knowledge in the scientific domain to which the question belongs. Where relevant, this includes a consideration of patient conditions and/or medications. |
Comprehensibility | How well the answer could be expected to be understood by the layman. |
Actionability | The degree to which the answers to the questions contain information that is useful and can be acted upon by the hypothetical layman asking the question. For example, while a different home-cooked, protein-rich dinner every weeknight might be an effective weight-loss strategy, this would not be a helpful suggestion for someone with limited spare time. |
Empathy/Relatability | The degree to which the answers convey empathy and understanding of the emotional state of the patient. |
References
- World Health Organization. Obesity and Overweight. 1 March 2024. Available online: https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight (accessed on 29 September 2024).
- Australian Government. Overweight and Obesity. 17 June 2024. Available online: https://www.aihw.gov.au/reports/overweight-obesity/overweight-and-obesity/contents/overweight-and-obesity (accessed on 29 September 2024).
- NCD Risk Factor Collaboration. Worldwide trends in underweight and obesity from 1990 to 2022: A pooled analysis of 3663 population-representative studies with 222 million children, adolescents, and adults. Lancet 2024, 403, 1027–1050. [Google Scholar] [CrossRef] [PubMed]
- World Obesity Atlas 2022; The World Obesity Federation: London, UK, 2022.
- World Health Organization. Health Service Delivery Framework for Prevention and Management of Obesity; WHO: Geneva, Switzerland, 2023.
- Deslippe, A.; Soanes, A.; Bouchaud, C.; Beckenstein, H.; Slim, M.; Plourde, H.; Cohen, T.R. Barriers and facilitators to diet, physical activity and lifestyle behavior intervention adherence: A qualitative systematic review of the literature. Int. J. Behav. Nutr. 2023, 20, 14. [Google Scholar] [CrossRef]
- Helland, M.; Nordbotten, G. Dietary changes, motivators, and barriers affecting diet and physical activity among Overweight and Obese: A mixed methods approach. Int. J. Environ. Res. Public. Health 2021, 18, 10582. [Google Scholar] [CrossRef]
- Hinchliffe, N.; Capehorn, M.; Bewick, M.; Feenie, J. The potential role of digital health in obesity care. Adv. Ther. 2022, 39, 4397–4412. [Google Scholar] [CrossRef]
- Idris, I.; Hampton, J.; Moncrieff, F.; Whitman, M. Effectiveness of a digital lifestyle change program in obese and type 2 diabetes populations: Service evaluation of real-world data. JMIR Diabetes 2020, 5, e15189. [Google Scholar] [CrossRef] [PubMed]
- Talay, L.; Vickers, M.; Ruiz, L. Effectiveness of an email-based, Semaglutide-supported weight-loss service for people with overweight and obesity in Germany: A real-world retrospective cohort analysis. Obesities 2024, 4, 256–269. [Google Scholar] [CrossRef]
- Chatelan, A.; Clerc, A.; Fonta, P. ChatGPT and future artificial intelligence chatbots: What may be the influence on credentialed nutrition and dietetics practitioners? J. Acad. Nutr. Diet. 2023, 11, 1525–1529. [Google Scholar] [CrossRef]
- Arslan, S. Exploring the potential of Chat GPT in personalized obesity treatment. Ann. Biomed. Eng. 2023, 51, 1887–1888. [Google Scholar] [CrossRef]
- Agne, A.; Gedrich, K. Persnalized dietary recommendations for obese individuals—A comparison of ChatGPT and the Food4Me algorithm. Clin. Nutr. Open Sci. 2024, 56, 192–201. [Google Scholar] [CrossRef]
- Golovaty, I.; Hagan, S. Direct-to-consumer platforms for New Antiobesity Medications—Concerns and potential opportunities. N. Engl. J. Med. 2024, 390, 677–680. [Google Scholar] [CrossRef]
- Open AI. Chat GPT. Optimizing Language Models for Dialogue. Available online: https://openai.com/blog/chatgpt (accessed on 20 September 2024).
- Brewster, R.; Gonzalez, P.; Khazanchi, R.; Butler, A.; Selcer, R.; Chu, D.; Aires, B.P.; Luercio, M.; Hron, J.D. Performance of ChatGPT and Google Translate for Pediatric Discharge instruction translation. Pediatrics 2024, 154, e2023065573. [Google Scholar] [CrossRef]
- Stoneham, S.; Livesey, A.; Cooper, H. ChatGPT versus clinician: Challenging the diagnostic capabiltiies of artificial intelligence in dermatology. Clin. Exp. Dermatol. 2024, 49, 707–710. [Google Scholar] [CrossRef] [PubMed]
- Sallam, M. ChatGPT utility in healthcare education, research and practice: Systematic review on the promising perspectives and valid concerns. Healthcare 2023, 11, 887. [Google Scholar] [CrossRef]
- Kirk, D.; van Eijnatten, E.; Camps, G. Comparison of answers between Chat GPT and human dietitians to common nutrition questions. J. Nutr. Metab. 2023, 1, 5548684. [Google Scholar]
- Guo, P.; Liu, G.; Xiang, X.; An, R. From AI to the table: A systematic review of ChatGPT’s potential and performance in meal planning and dietary recommendations. Dietetics 2025, 4, 7. [Google Scholar] [CrossRef]
- Hieronimus, B.; Hammann, S.; Podszun, M. Can the AI tools ChatGPT and Bard generate energy, macro- and micro-nutrient sufficient meal plans for different dietary patterns? Nutr. Res. 2024, 128, 105–114. [Google Scholar] [CrossRef]
- Qarajeh, A.; Tangpanithandee, S.; Thongprayoon, C.; Suppadungsuk, S.; Krisanapan, P.; Aiumtrakul, N.; Valencia, O.A.G.; Miao, J.; Qureshi, F.; Cheungpasitporn, W. AI-powered renal diet support: Performance of ChatGPT, Bard AI, and Bing Chat. Clin. Pract. 2023, 13, 1160–1172. [Google Scholar] [CrossRef] [PubMed]
- Bayram, H.; Ozturkcan, A. AI showdown: Info accuracy on protein quality content in foods from ChatGPT 3.5, ChatGPT 4, bard AI and bing chat. Br. Food J. 2024, 126, 3335–3346. [Google Scholar] [CrossRef]
- Carlbring, P.; Hadjistavropolous, H.; Kleiboer, A.; Andersson, G. A new era in Internet interventions: The advent of Chat=GPT and AI-assisted therapist guidance. Internet Interv. 2023, 32, 100621. [Google Scholar] [CrossRef]
- Nashwan, A.; Abujaber, A.; Choudry, H. Embracing the future of physiscian-patient communication: GPT-4 in gastroenterology. Gastroenterol. Endosc. 2023, 1, 132–135. [Google Scholar] [CrossRef]
- Ilicki, J. A framework for critically assessing ChatGPT and other large language artificial intelligence model applications in health care. Mayo Clin. Proc. Digit. Health 2023, 1, 185–188. [Google Scholar] [CrossRef]
- Sorin, V.; Brin, D.; Barash, Y.; Konen, E.; Charney, A.; Nadkarni, G.; Klang, E. Large Language Models and Empathy: Systematic Review. J. Med. Internet Res. 2024, 26, e52597. [Google Scholar] [CrossRef] [PubMed]
- Chuen, C.; Tan, L.; Khanh Le, M.; Tang, B.; Liaw, S.Y.; Tierney, T.; Ho, Y.Y.; Lim, B.E.E.; Lim, D.; Ng, R.; et al. The development of empathy in the healthcare setting: A qualitative approach. BMC Med. Educ. 2022, 22, 245. [Google Scholar]
- Liu, Z.; Zhang, L.; Wu, Z.; Yu, X.; Cao, C.; Dai, H.; Liu, N.; Liu, J.; Liu, W.; Li, Q.; et al. Surviving ChatGPT in healthcare. Front. Radiol. 2024, 3, 1224682. [Google Scholar] [CrossRef]
- Javaid, M.; Haleem, A.; Pratap Singh, R. ChatGPT for healthcare services: An emerging stage for an innovative perspective. BenchCouncil Trans. Benchmarks Stand. Eval. 2023, 3, 100105. [Google Scholar] [CrossRef]
- Garcia, M. ChatGPT as a virtual dietitian: Exploring its potential as a tool for improving nutrition knowledge. Appl. Syst. Innov. 2023, 6, 96. [Google Scholar] [CrossRef]
- Ponzo, V.; Goitre, I.; Favaro, E. Is ChatGPT an effective tool for providing dietary advice? Nutrients 2024, 16, 469. [Google Scholar] [CrossRef]
- Mahase, E. GLP-1 agonists: US sees 700% increase over four years in number of patients without starting treatment. BMJ 2024, 386, q1645. [Google Scholar] [CrossRef]
- Strumila, R.; Lengvenyte, A.; Guillaume, S.; Nobile, B.; Olie, E.; Courtet, P. GLP-1 agonists and risk of suicidal thoughts and behaviours: Confound by indication once again? A narrative review. Eur. Neuropsychopharmacol. 2024, 87, 29–34. [Google Scholar] [CrossRef]
- De Winer, J.; Dodou, D.; Eisma, Y. System 2 thinking in OpenAI’s o1-preview model: Near-perfect performance on a mathematics exam. Computers 2024, 13, 278. [Google Scholar] [CrossRef]
- Sivarajkumar, S.; Kelley, M.; Samolyk-Mazzanti, A.; Visweswaran, S.; Wang, Y. An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study. JMIR Med. Inf. 2024, 12, e55318. [Google Scholar] [CrossRef] [PubMed]
- Luo, X.; Muhammad Tahabi, F.; Marc, T.; Haunert, L.A.; Storey, S. Zero-shot learning to extract assessment criteria and medical services from the preventative healthcare guidelines using large language models. J. Am. Med. Inform. Assoc. 2024, 31, 1743–1753. [Google Scholar] [CrossRef] [PubMed]
- Ruxton, G.; Wilkinson, D.; Neuhauser, M. Advice on testing the null hypothesis that a sample is drawn from a normal distribution. Anim. Behaviour. 2015, 107, 249–252. [Google Scholar] [CrossRef]
- Nwobi, F.; Akanno, F. Power comparison of ANOVA and Kruskal-Wallis tests when error assumptions are violated. Adv. Methodol. Stat. 2021, 18, 53–71. [Google Scholar] [CrossRef]
- Talay, L.; Vickers, M. Why people seek obesity care through digital rather than in-person services: A quantitative multinational analysis of patients from a large unsubsidized digital obesity provider. Cureus 2024, 16, e75603. [Google Scholar] [CrossRef] [PubMed]
- Kim, T.; von dem Knesebeck, O. Income and obesity: What is the direction of the relationship? A systematic review and meta-analysis. BMJ Open 2018, 8, e019862. [Google Scholar] [CrossRef]
- Australian Institute of Health and Welfare. Inequalities in Overweight and Obesity and the Social Determinants of Health; Australian Institute of Health and Welfare: Canberra, Australia, 2021.
- Aydin, S.; Karabacak, M.; Vlachos, V.; Margetis, K. Navigating the potential and pitfalls of large language models in patient-centered medication guidance and self-decision support. Sec. Regul. Sci. 2025, 12, 1527864. [Google Scholar] [CrossRef]
Human Dietitian 1 | Human Dietitian 2 | GPT-4o | GPT-4o1 Preview | Eta Squared (η²) | p-Value | |
---|---|---|---|---|---|---|
Question 1 | 8 (CI 6.52, 8.7) | 7 (CI 6.19, 7.72) | 8 (CI 6.83, 8.97) | 8 (CI 6.34, 8.65) | −0.006 | 0.449 |
Question 2 | 8.5 (CI 8.13,9.23) | 8 (CI 7.24, 8.35) | 7 (CI 6.14, 8.12) | 8 (CI 7.42, 8.8) | −0.024 | 0.671 |
Question 3 | 7.5 (CI 6.42, 8.63) | 7 (CI 6.54, 7.6) | 7 (CI 6.23, 7.92) | 8 (CI 6.94, 8.42) | −0.045 | 0.961 |
Question 4 | 7 (CI 6.22, 7.73) | 5 (CI 4.60, 5.37) | 7.5 (CI 6.55, 8.34) | 8 (CI 6.71, 8.21) | 0.077 | 0.055 |
Question 5 | 8 (CI 7.1, 9.03) | 7.5 (CI 6.85, 8.3) | 9 (CI 7.88, 9.45) | 8.5 (CI 7.46, 9.04) | −0.019 | 0.608 |
Question 6 | 7 (CI 5.96, 8.14) | 9 (CI 8.66, 9.41) | 8 (CI 7.28, 9.11) | 7 (CI 6.64, 8.49) | 0.097 | 0.032 * |
Question 7 | 7 (CI 6.35, 7.88) | 8 (CI 7.04, 8.79) | 9 (CI 8.14, 9.73) | 7.5 (CI 5.32, 9.12) | 0.121 | 0.017 * |
Question 8 | 8 (CI 7.28, 9.04) | 7.5 (CI 6.22, 8.65) | 8 (CI 7.04, 9.19) | 9 (CI 7.52, 9.41) | 0.151 | 0.007 ** |
Question 9 | 9 (CI 8.43, 9.55) | 8 (CI 7.2, 8.65) | 10 (CI 10,10) | 9 (CI 7.93, 9.84) | 0.012 | 0.294 |
Question 10 | 7 (CI 6.3, 8.09) | 8 (CI 7.08, 8.87) | 9 (CI 8.7, 9.42) | 7.5 (CI 6.87, 8.38) | 0.127 | 0.014 * |
Response Score | Coach | Levels | Z Score | p-Adjusted Value |
---|---|---|---|---|
GPT-4o—GPT-4o1 | 0.97 | 0.396 | ||
GPT-4o1—Human 1 | −1.88 | 0.121 | ||
GPT-4o—Human 1 | −2.85 | 0.026 * | ||
GPT-4o1—Human 2 | −1.52 | 0.191 | ||
GPT-4o—Human 2 | −2.50 | 0.038 * | ||
Human 1—Human 2 | 0.35 | 0.72 |
Response Score | Coach | Levels | Z Score | p-Adjusted Value |
---|---|---|---|---|
GPT-4o—GPT-4o1 | 1.98 | 0.144 | ||
GPT-4o—Human 1 | 3.12 | 0.011 * | ||
GPT-4o1—Human 1 | 1.14 | 0.305 | ||
GPT-4o—Human 2 | 1.25 | 0.314 | ||
GPT-4o1—Human 2 | −0.72 | 0.469 | ||
Human 1—Human 2 | −1.86 | 0.125 |
Response Score | Coach | Levels | Z Score | p-Adjusted Value |
---|---|---|---|---|
GPT-4o—GPT-4o1 | 2.12 | 0.102 | ||
GPT-4o—Human 1 | −1.32 | 0.225 | ||
GPT-4o1—Human 1 | −3.43 | 0.004 ** | ||
GPT-4o—Human 2 | 0.14 | 0.891 | ||
GPT-4o1—Human 2 | −1.98 | 0.095 | ||
Human 1—Human 2 | 1.45 | 0.219 |
Response Score | Coach | Levels | Z Score | p-Adjusted Value |
---|---|---|---|---|
GPT-4o—GPT-4o1 | 0.35 | 0.724 | ||
GPT-4o—Human 1 | 1.11 | 0.4 | ||
GPT-4o1—Human 1 | 0.76 | 0.538 | ||
GPT-4o—Human 2 | 1.75 | 0.159 | ||
GPT-4o1—Human 2 | 2.11 | 0.106 | ||
Human 1—Human 2 | −2.86 | 0.025 * |
Variable | N | η² | p-Value |
---|---|---|---|
Comprehensibility | 40 | −0.05 | 0.74 |
Empathy/Relatability | 40 | −0.05 | 0.79 |
Scientific correctness | 40 | 0.04 | 0.22 |
Actionability | 40 | 0.27 | 0.005 ** |
Response Score | Coach | Levels | Z Score | p-Adjusted Value |
---|---|---|---|---|
GPT-4o—GPT-4o1 | 1.78 | 0.149 | ||
GPT-4o—Human 1 | 3.26 | 0.007 ** | ||
GPT-4o1—Human 1 | 1.48 | 0.209 | ||
GPT-4o—Human 2 | 2.85 | 0.013 * | ||
GPT-4o1—Human 2 | 1.07 | 0.341 | ||
Human 1—Human 2 | −0.41 | 0.683 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Talay, L.; Lagesen, L.; Yip, A.; Vickers, M.; Ahuja, N. ChatGPT-4o and 4o1 Preview as Dietary Support Tools in a Real-World Medicated Obesity Program: A Prospective Comparative Analysis. Healthcare 2025, 13, 647. https://doi.org/10.3390/healthcare13060647
Talay L, Lagesen L, Yip A, Vickers M, Ahuja N. ChatGPT-4o and 4o1 Preview as Dietary Support Tools in a Real-World Medicated Obesity Program: A Prospective Comparative Analysis. Healthcare. 2025; 13(6):647. https://doi.org/10.3390/healthcare13060647
Chicago/Turabian StyleTalay, Louis, Leif Lagesen, Adela Yip, Matt Vickers, and Neera Ahuja. 2025. "ChatGPT-4o and 4o1 Preview as Dietary Support Tools in a Real-World Medicated Obesity Program: A Prospective Comparative Analysis" Healthcare 13, no. 6: 647. https://doi.org/10.3390/healthcare13060647
APA StyleTalay, L., Lagesen, L., Yip, A., Vickers, M., & Ahuja, N. (2025). ChatGPT-4o and 4o1 Preview as Dietary Support Tools in a Real-World Medicated Obesity Program: A Prospective Comparative Analysis. Healthcare, 13(6), 647. https://doi.org/10.3390/healthcare13060647