Large Language Models for Real-World Nutrition Assessment: Structured Prompts, Multi-Model Validation and Expert Oversight
Abstract
1. Introduction
1.1. Advances in Artificial Intelligence for Nutrition Science
1.2. The Role of Large Language Models in Dietary Classification
1.3. Artificial Intelligence, Large Language Models, and Real-World Dietary Data
1.4. The Impact of Ultra-Processed Foods on Dietary Quality and Health
1.5. Integrating Frameworks for Processing and Nutrient-Based Food Classification
1.6. Limitations and Critiques of the NOVA Classification
1.7. Challenges and Opportunities in Multilingual AI Dietary Assessment
1.8. Study Purpose and Significance
2. Materials and Methods
2.1. Study Design and Data Collection
- Claude Opus 4.5 (Anthropic)—Released on 24 November 2025;
- Gemini 3 pro (Google)—Released on 18 November 2025;
- GPT-5.1-chat-latest (OpenAI)—Released on 12 November 2025;
2.2. Double Step Prompt: Classification Frameworks Based on NOVA and WHO Guidelines
- Processing-Based Classification (NOVA System):
- 2.
- Nutritional Threshold Evaluation (WHO Guidelines):
2.3. Prompting Framework and Data Processing
- Food item classification (Healthy/Unhealthy)
- Justification for classification (based on NOVA and WHO criteria)
2.4. Simplified Prompt
2.5. Evaluation by Human Experts
2.6. Model Setup and Language Protocol
2.7. Statistical Analysis
2.8. Use of Generative AI Tools
3. Results
3.1. Overview of Classifications
3.2. Classification Agreement: Double-Step Prompt (NOVA + WHO Framework)
3.3. Classification Agreement: Simplified Prompt
3.4. Cross-Comparison of Prompt Strategies, LLMs, and Human Experts
4. Discussion
4.1. Inter-Model Agreement and Classification Consistency
4.2. Prompt Structure and Classification Specificity
4.3. Linguistic Considerations: Polish Language and AI Capabilities
4.4. Conservative Classification Bias and Risk Mitigation Strategies
4.5. Practical Efficiency: Workflow Assisted by LLM Versus Manual Classification from Scratch
4.6. Human Expertise and the Importance of Expert Oversight
4.7. Limitations and Future Directions
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Liz Martins, M.; Abreu, S. Dietary Surveys and Nutritional Epidemiology. Nutrients 2025, 17, 263. [Google Scholar] [CrossRef]
- Okada, E.; Nakade, M.; Hanzawa, F.; Murakami, K.; Matsumoto, M.; Sasaki, S.; Takimoto, H. National Nutrition Surveys Applying Dietary Records or 24-h Dietary Recalls with Questionnaires: A Scoping Review. Nutrients 2023, 15, 4739. [Google Scholar] [CrossRef]
- Belkhouribchia, J.; Pen, J.J. Large language models in clinical nutrition: An overview of its applications, capabilities, limitations, and potential future prospects. Front. Nutr. 2025, 12, 1635682. [Google Scholar] [CrossRef]
- Cofre, S.; Sanchez, C.; Quezada-Figueroa, G.; López-Cortés, X.A. Validity and accuracy of artificial intelligence-based dietary intake assessment methods: A systematic review. Br. J. Nutr. 2025, 133, 1241–1253. [Google Scholar] [CrossRef]
- Phalle, A.; Gokhale, D. Navigating next-gen nutrition care using artificial intelligence-assisted dietary assessment tools-a scoping review of potential applications. Front. Nutr. 2025, 12, 1518466. [Google Scholar] [CrossRef] [PubMed]
- Azimi, I.; Qi, M.; Wang, L.; Rahmani, A.M.; Li, Y. Evaluation of LLMs accuracy and consistency in the registered dietitian exam through prompt engineering and knowledge retrieval. Sci. Rep. 2025, 15, 1506. [Google Scholar] [CrossRef] [PubMed]
- Fridolfsson, J.; Sjöberg, E.; Thiwång, M.; Pettersson, S. Performance Evaluation of 3 Large Language Models for Nutritional Content Estimation from Food Images. Curr. Dev. Nutr. 2025, 9, 107556. [Google Scholar] [CrossRef]
- Assiri, F.Y.; Alahmadi, M.D.; Almuashi, M.A.; Almansour, A.M. Extract Nutritional Information from Bilingual Food Labels Using Large Language Models. J. Imaging 2025, 11, 271. [Google Scholar] [CrossRef]
- Li, X.; Yin, A.; Choi, H.Y.; Chan, V.; Allman-Farinelli, M.; Chen, J. Evaluating the Quality and Comparative Validity of Manual Food Logging and Artificial Intelligence-Enabled Food Image Recognition in Apps for Nutrition Care. Nutrients 2024, 16, 2573. [Google Scholar] [CrossRef]
- Zheng, J.; Wang, J.; Shen, J.; An, R. Artificial Intelligence Applications to Measure Food and Nutrient Intakes: Scoping Review. J. Med. Internet Res. 2024, 26, e54557. [Google Scholar] [CrossRef] [PubMed]
- Kopitar, L.; Bedrač, L.; Strath, L.J.; Bian, J.; Stiglic, G. Improving Personalized Meal Planning with Large Language Models: Identifying and Decomposing Compound Ingredients. Nutrients 2025, 17, 1492. [Google Scholar] [CrossRef]
- Liang, S.; Zhou, Y.; Zhang, Q.; Yu, S.; Wu, S. Ultra-processed foods and risk of all-cause mortality: An updated systematic review and dose-response meta-analysis of prospective cohort studies. Syst. Rev. 2025, 14, 53. [Google Scholar] [CrossRef] [PubMed]
- Barbaresko, J.; Bröder, J.; Conrad, J.; Szczerba, E.; Lang, A.; Schlesinger, S. Ultra-processed food consumption and human health: An umbrella review of systematic reviews with meta-analyses. Crit. Rev. Food Sci. Nutr. 2025, 65, 1999–2007. [Google Scholar] [CrossRef]
- Commercial Determinants of Noncommunicable Diseases in the WHO European Region; Licence: CC BY-NC-SA 3.0 IGO; WHO Regional Office for Europe: Copenhagen, Denmark, 2024.
- Martini, D.; Godos, J.; Bonaccio, M.; Vitaglione, P.; Grosso, G. Ultra-Processed Foods and Nutritional Dietary Profile: A Meta-Analysis of Nationally Representative Samples. Nutrients 2021, 13, 3390. [Google Scholar] [CrossRef]
- Britannica. Available online: www.britannica.com/topic/Nova-scale (accessed on 29 October 2025).
- Saturated Fatty Acid and Trans-Fatty Acid Intake for Adults and Children: WHO Guideline; Licence: CC BY-NC-SA 3.0 IGO; World Health Organization: Geneva, Switzerland, 2023.
- Total Fat Intake for the Prevention of Unhealthy Weight Gain in Adults and Children: WHO Guideline; Licence: CC BY-NC-SA 3.0 IGO; World Health Organization: Geneva, Switzerland, 2023.
- Carbohydrate Intake for Adults and Children: WHO Guideline; Licence: CC BY-NC-SA 3.0 IGO; World Health Organization: Geneva, Switzerland, 2023.
- Koios, D.; Machado, P.; Lacy-Nichols, J. Representations of Ultra-Processed Foods: A Global Analysis of How Dietary Guidelines Refer to Levels of Food Processing. Int. J. Health Policy Manag. 2022, 11, 2588–2599. [Google Scholar] [CrossRef]
- Tompa, O.; Kiss, A.; Soós, S.; Lakner, Z.; Raner, A.; Kasza, G.; Szakos, D. Fifteen Years of NOVA Food-Processing Classification: “Friend or Foe” Among Sustainable Diet Indicators? A Scoping Review. Nutr. Rev. 2025, 83, 771–791. [Google Scholar] [CrossRef]
- Louie, J.C.Y. Are all ultra-processed foods bad? A critical review of the NOVA classification system. Proc. Nutr Soc. 2025, 1–9. [Google Scholar] [CrossRef] [PubMed]
- Vlassopoulos, A.; Katidi, A.; Noutsos, S.; Kapsokefalou, M. Precision Food Composition Data as a Tool to Decipher the Riddle of Ultra-Processed Foods and Nutritional Quality. Foods 2024, 13, 1259. [Google Scholar] [CrossRef] [PubMed]
- Huang, Y.; Burgoine, T.; Essman, M.; Theis, D.; Bishop, T.; Adams, J. Monitoring the Nutrient Composition of Food Prepared Out-of-Home in the United Kingdom: Database Development and Case Study. JMIR Public Health Surveill. 2022, 8, e39033. [Google Scholar] [CrossRef]
- Huang, Y.; Burgoine, T.; White, C.M.; Keeble, M.; Bishop, T.R.P.; Hammond, D.; Adams, J. Neighbourhood out-of-home food environment, menu healthiness, and their associations with meal purchasing and diet quality: A multiverse analysis. Nutr. J. 2025, 24, 56. [Google Scholar] [CrossRef]
- Kim, Y.; Russell, J.; Karpinska, M.; Iyyer, M. One Ruler to Measure Them All: Benchmarking Multilingual Long-Context Language Models. arXiv 2025, arXiv:2503.01996. Available online: https://arxiv.org/abs/2503.01996 (accessed on 30 September 2025).
- Wróbel, K. Optimizing LLMs for Polish reading comprehension: A comparative study of ensemble and unified approaches. In Proceedings of the PolEval 2024 Workshop; Ogrodniczuk, M., Kobyliński, Ł., Eds.; Institute of Computer Science, Polish Academy of Sciences: Warsaw, Poland, 2024; Available online: https://ruj.uj.edu.pl/handle/item/547435 (accessed on 29 October 2025).
- National Health Program 2016–2020; Contract No 6/1/3.1.1/NPZ/2017/1210/923; Ministerstwo Zdrowia (Ministry of Health): Warsaw, Poland, 2016.
- Gemini 3 Developer Guide. Available online: https://ai.google.dev/gemini-api/docs/gemini-3#temperature/ (accessed on 30 November 2025).
- Hsuan, C.F.; Lee, Y.J.; Hsu, H.C.; Ouyang, C.M.; Yeh, W.C.; Tang, W.H. Comparison of Accuracy in the Evaluation of Nutritional Labels on Commercial Ready-to-Eat Meal Boxes Between Professional Nutritionists and Chatbots. Nutrients 2025, 17, 3044. [Google Scholar] [CrossRef]
- Ahrné, L.; Chen, H.; Henry, C.J.; Kim, H.-S.; Schneeman, B.; Windhab, E.J. Defining the role of processing in food classification systems—The IUFoST formulation & processing approach. npj Sci. Food 2025, 9, 56. [Google Scholar] [CrossRef]
- Menichetti, G.; Ravandi, B.; Mozaffarian, D.; Barabási, A.L. Machine learning prediction of the degree of food processing. Nat. Commun. 2023, 14, 2312. [Google Scholar] [CrossRef]
- Parameswaran, V.; Bernard, J.; Bernard, A.; Deo, N.; Tsung, S.; Lyytinen, K.; Sharp, C.; Rodriguez, F.; Maron, D.; Dash, R. Evaluating Large Language Models and Retrieval-Augmented Generation Enhancement for Delivering Guideline-Adherent Nutrition Information for Cardiovascular Disease Prevention: Cross-Sectional Study. J. Med. Internet Res. 2025, 27, e78625. [Google Scholar] [CrossRef] [PubMed]
- Suraya Mohd Dan, A.; Linoby, A.; Shahlan Kasim, S.; Zaki, S.; Sazali, R.; Yusoff, Y.; Nasir, Z.; Haziq Abidin, A. Validation of a personalized AI prompt generator (NExGEN-ChatGPT) for obesity management using fuzzy Delphi method. Biol. Methods Protoc. 2025, 10, bpaf085. [Google Scholar] [CrossRef]
- Vilakati, S. Prompt engineering for accurate statistical reasoning with large language models in medical research. Front. Artif. Intell. 2025, 8, 1658316. [Google Scholar] [CrossRef]
- Chen, S.; Zhou, T. Culturally based semantic losses in Lonely Planet’s travel guides translations for Beijing, Shanghai, and Sichuan. Front. Commun. 2024, 9, 1343784. [Google Scholar] [CrossRef]
- Karabay, A.; Bolatov, A.; Varol, H.A.; Chan, M.Y. A Central Asian Food Dataset for Personalized Dietary Interventions. Nutrients 2023, 15, 1728. [Google Scholar] [CrossRef] [PubMed]
- Li, S. Translating food terminology as cultural and communicative processes. In Terminology Translation in Chinese Contexts; Routledge: Oxfordshire, UK, 2021. [Google Scholar] [CrossRef]
- Coskun Benlidayi, I.; Gupta, L. Translation and Cross-Cultural Adaptation: A Critical Step in Multi-National Survey Studies. J. Korean Med. Sci. 2024, 39, e336. [Google Scholar] [CrossRef]
- Kalpakoglou, K.; Calderón-Pérez, L.; Boqué, N.; Guldas, M.; Erdoğan Demir, Ç.; Gymnopoulos, L.P.; Dimitropoulos, K. An AI-based nutrition recommendation system: Technical validation with insights from Mediterranean cuisine. Front. Nutr. 2025, 12, 1546107. [Google Scholar] [CrossRef] [PubMed]
- Jeong, J.; Kim, S.; Pan, L.; Hwang, D.; Kim, D.; Choi, J.; Kwon, Y.; Yi, P.; Jeong, J.; Yoo, S.J. Reducing the workload of medical diagnosis through artificial intelligence: A narrative review. Medicine 2025, 104, e41470. [Google Scholar] [CrossRef]
- Lo, F.P.W.; Qiu, J.; Jobarteh, M.L.; Sun, Y.; Wang, Z.; Jiang, S.; Baranowski, T.; Anderson, A.K.; McCrory, M.A.; Sazonov, E.; et al. AI-enabled wearable cameras for assisting dietary assessment in African populations. npj Digit. Med. 2024, 7, 356. [Google Scholar] [CrossRef]
- Panayotova, G.G. Artificial Intelligence in Nutrition and Dietetics: A Comprehensive Review of Current Research. Healthcare 2025, 13, 2579. [Google Scholar] [CrossRef]
- Ngo, K.; Mekhail, S.; Chan, V.; Li, X.; Yin, A.; Choi, H.Y.; Allman-Farinelli, M.; Chen, J. The Use of Artificial Intelligence (AI) to Support Dietetic Practice Across Primary Care: A Scoping Review of the Literature. Nutrients 2025, 17, 3515. [Google Scholar] [CrossRef]
- Arnett, E.; Pavlick, E. Why Do Language Models Perform Worse for Morphologically Complex Languages? arXiv 2024, arXiv:2411.14198. [Google Scholar] [CrossRef]
- Yan, R.; Luo, H.; Lu, J.; Liu, D.; Posluszny, H.; Dhaliwal, M.P.; MacLeod, J.; Qin, Y.; Yang, C.; Hartman, T.J.; et al. DietAI24 as a framework for comprehensive nutrition estimation using multimodal large language models. Commun. Med. 2025, 5, 458. [Google Scholar] [CrossRef] [PubMed]
- Seoni, S.; Jahmunah, V.; Salvi, M.; Barua, P.D.; Molinari, F.; Acharya, U.R. Application of uncertainty quantification to artificial intelligence in healthcare: A review of last decade (2013–2023). Comput. Biol. Med. 2023, 165, 107441. [Google Scholar] [CrossRef]
- Zhang, X.; Xue, Y.; Su, X.; Chen, S.; Liu, K.; Chen, W.; Liu, M.; Hu, Y. A Transfer Learning Approach to Correct the Temporal Performance Drift of Clinical Prediction Models: Retrospective Cohort Study. JMIR Med. Inform. 2022, 10, e38053. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Papastratis, I.; Konstantinidis, D.; Daras, P.; Dimitropoulos, K. AI nutrition recommendation using a deep generative model and ChatGPT. Sci. Rep. 2024, 14, 14620. [Google Scholar] [CrossRef]
- Schaekermann, M.; Salleb-Aouissi, A.; Hollerer, T. Ambiguity-aware AI Assistants for Medical Data Analysis. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 25–30 April 2020; ACM: New York, NY, USA, 2020; pp. 1–14. [Google Scholar]
- Hager, P.; Jungmann, F.; Holland, R.; Bhagat, K.; Hubrecht, I.; Knauer, M.; Vielhauer, J.; Makowski, M.; Braren, R.; Kaissis, G.; et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat. Med. 2024, 30, 2613–2622. [Google Scholar] [CrossRef]
- Christof, M.; Armoundas, A.A. Implications of integrating large language models into clinical decision making. Commun. Med. 2025, 5, 490. [Google Scholar] [CrossRef] [PubMed]
- Moëll, B.; Sand Aronsson, F. Harm Reduction Strategies for Thoughtful Use of Large Language Models in the Medical Domain: Perspectives for Patients and Clinicians. J. Med. Internet Res. 2025, 27, e75849. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Kore, A.; Abbasi Bavil, E.; Subasri, V.; Abdalla, M.; Fine, B.; Dolatabadi, E.; Abdalla, M. Empirical data drift detection experiments on real-world medical imaging data. Nat. Commun. 2024, 15, 1887. [Google Scholar] [CrossRef]
- Wong, A.; Sussman, J.B. Understanding Model Drift and Its Impact on Health Care Policy. JAMA Health Forum. 2025, 6, e252724. [Google Scholar] [CrossRef] [PubMed]


| Comparison | Healthy–Healthy | Healthy– Unhealthy | Unhealthy–Healthy | Unhealthy– Unhealthy | Total Agreement |
|---|---|---|---|---|---|
| GPT-5.1 vs. Human | 33.4% | 1.2% | 0.8% | 56.9% | 90.3% |
| Opus 4.5 vs. Human | 35.4% | 2.2% | 0.9% | 55.9% | 91.3% |
| Gemini 3 pro vs. Human | 34.2% | 1.1% | 1.1% | 57.0% | 91.2% |
| Dominant vs. Human | 34.3% | 1.3% | 0.4% | 56.8% | 91.1% |
| LLM | TP | FP | FN | TN | Accuracy | Precision | Recall | F1-Score | Specificity |
|---|---|---|---|---|---|---|---|---|---|
| GPT-5.1-chat-latest | 1134 | 169 | 23 | 666 | 0.904 | 0.870 | 0.980 | 0.922 | 0.798 |
| Claude Opus 4.5 | 1114 | 130 | 43 | 705 | 0.913 | 0.895 | 0.963 | 0.928 | 0.844 |
| Gemini 3 pro | 1136 | 153 | 21 | 682 | 0.913 | 0.881 | 0.982 | 0.929 | 0.817 |
| Dominant vs. Human | 1131 | 152 | 26 | 683 | 0.910 | 0.881 | 0.977 | 0.927 | 0.818 |
| Comparison | Healthy–Healthy | Healthy– Unhealthy | Unhealthy–Healthy | Unhealthy– Unhealthy | Total Agreement |
|---|---|---|---|---|---|
| GPT-5.1 vs. Human | 37.6% | 2.1% | 3.8% | 56.0% | 93.6% |
| Opus 4.5 vs. Human | 39.9% | 5.3% | 3.2% | 52.8% | 92.7% |
| Gemini 3 pro vs. Human | 39.6% | 4.9% | 2.6% | 52.9% | 92.5% |
| Dominant vs. Human | 39.8% | 3.7% | 2.2% | 54.4% | 94.2% |
| LLM | TP | FP | FN | TN | Accuracy | Precision | Recall | F1-Score | Specificity |
|---|---|---|---|---|---|---|---|---|---|
| GPT-5.1-chat-latest | 1115 | 86 | 42 | 749 | 0.936 | 0.928 | 0.964 | 0.946 | 0.897 |
| Claude Opus 4.5 | 1052 | 41 | 105 | 794 | 0.927 | 0.962 | 0.909 | 0.935 | 0.951 |
| Gemini 3 pro | 1060 | 46 | 97 | 789 | 0.928 | 0.958 | 0.916 | 0.937 | 0.944 |
| Dominant | 1084 | 43 | 73 | 792 | 0.942 | 0.962 | 0.937 | 0.949 | 0.948 |
| WHO GPT-5.1 | WHO Opus 4.5 | WHO Gemini 3 Pro | WHO Dominant | Simple GPT-5.1 | Simple Opus 4.5 | Simple Gemini 3 Pro | Simple Dominant | Human | |
|---|---|---|---|---|---|---|---|---|---|
| WHO GPT-5.1 | — | 1608.3 | 1745.5 | 1838.0 | 1478.5 | 1174.5 | 1207.2 | 1282.4 | 1296.6 |
| WHO Opus 4.5 | — | 1661.7 | 1749.0 | 1525.0 | 1331.3 | 1287.0 | 1347.1 | 1347.4 | |
| WHO Gemini 3 pro | — | 1897.1 | 1587.8 | 1286.8 | 1349.2 | 1401.3 | 1354.5 | ||
| WHO Dominant | — | 1548.6 | 1250.4 | 1305.2 | 1363.6 | 1338.8 | |||
| Simple GPT-5.1 | — | 1551.3 | 1520.6 | 1694.1 | 1500.7 | ||||
| Simple Opus 4.5 | — | 1641.3 | 1827.6 | 1449.0 | |||||
| Simple Gemini 3 pro | — | 1798.9 | 1456.1 | ||||||
| Simple Dominant | — | 1547.6 | |||||||
| Human | — |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Ase, A.; Borowicz, J.; Rakocy, K.; Piekarska, B. Large Language Models for Real-World Nutrition Assessment: Structured Prompts, Multi-Model Validation and Expert Oversight. Nutrients 2026, 18, 23. https://doi.org/10.3390/nu18010023
Ase A, Borowicz J, Rakocy K, Piekarska B. Large Language Models for Real-World Nutrition Assessment: Structured Prompts, Multi-Model Validation and Expert Oversight. Nutrients. 2026; 18(1):23. https://doi.org/10.3390/nu18010023
Chicago/Turabian StyleAse, Aia, Jacek Borowicz, Kamil Rakocy, and Barbara Piekarska. 2026. "Large Language Models for Real-World Nutrition Assessment: Structured Prompts, Multi-Model Validation and Expert Oversight" Nutrients 18, no. 1: 23. https://doi.org/10.3390/nu18010023
APA StyleAse, A., Borowicz, J., Rakocy, K., & Piekarska, B. (2026). Large Language Models for Real-World Nutrition Assessment: Structured Prompts, Multi-Model Validation and Expert Oversight. Nutrients, 18(1), 23. https://doi.org/10.3390/nu18010023

