AI-Generated Exercise Prescriptions for At-Risk Populations: Safety and Feasibility of a Large Language Model Assessed by Expert Evaluation
Abstract
1. Introduction
2. Materials and Methods
2.1. Study Design
2.2. Clinical Case Construction
2.3. Prompt Design and Exercise Plan Generation
2.4. AI Model and Session Control
2.5. Evaluators, Evaluation Criteria, and Evaluation Procedure
2.6. Statistical Analysis
3. Results
3.1. Expert-Specific and Overall Mean Scores by Prompt Stage and Clinical Case
3.2. Inter-Expert Reliability and Internal Consistency of Expert Evaluations
3.3. Item-Level Mean Score Comparison Across Prompt Specificity Levels
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Raza, M.M.; Venkatesh, K.P.; Kvedar, J.C. Generative AI and large language models in health care: Pathways to implementation. NPJ Digit. Med. 2024, 7, 62. [Google Scholar] [CrossRef] [PubMed]
- Meng, X.; Yan, X.; Zhang, K.; Liu, D.; Cui, X.; Yang, Y.; Zhang, M.; Cao, C.; Wang, J.; Wang, X. The application of large language models in medicine: A scoping review. iScience 2024, 27, 109713. [Google Scholar] [CrossRef] [PubMed]
- Aydin, S.; Karabacak, M.; Vlachos, V.; Margetis, K. Large language models in patient education: A scoping review of applications in medicine. Front. Med. 2024, 11, 1477898. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.-F.; Liu, X.-Q. Using ChatGPT to promote college students’ participation in physical activities and its effect on mental health. World J. Psychiatry 2024, 14, 330. [Google Scholar] [CrossRef]
- Philuek, P.; Kusump, S.; Sathianpoonsook, T.; Jansupom, C.; Sawanyawisuth, P.; Sawanyawisuth, K.; Chainarong, A. The effects of chat GPT generated exercise program in healthy overweight young adults: A pilot study. J. Hum. Sport Exerc. 2025, 20, 169–179. [Google Scholar] [CrossRef]
- Li, G.; Li, H.; Su, Y.; Li, Y.; Jiang, S.; Zhang, G. GPT-4 as a virtual fitness coach: A case study assessing its effectiveness in providing weight loss and fitness guidance. BMC Public Health 2025, 25, 2466. [Google Scholar] [CrossRef]
- Enichen, E.J.; Young, C.C.; Frates, E.P. The Potential of AI to Create Personalized Exercise Plans. Health Promot. Pract. 2025. online ahead of print. [Google Scholar] [CrossRef]
- Garber, C.E.; Blissmer, B.; Deschenes, M.R.; Franklin, B.A.; Lamonte, M.J.; Lee, I.-M.; Nieman, D.C.; Swain, D.P. American College of Sports Medicine position stand. Quantity and quality of exercise for developing and maintaining cardiorespiratory, musculoskeletal, and neuromotor fitness in apparently healthy adults: Guidance for prescribing exercise. Med. Sci. Sports Exerc. 2011, 43, 1334–1359. [Google Scholar] [CrossRef]
- Festa, R.R.; Jofré-Saldía, E.; Candia, A.A.; Monsalves-Álvarez, M.; Flores-Opazo, M.; Peñailillo, L.; Marzuca-Nassr, G.N.; Aguilar-Farias, N.; Fritz-Silva, N.; Cancino-Lopez, J. Next steps to advance general physical activity recommendations towards physical exercise prescription: A narrative review. BMJ Open Sport Exerc. Med. 2023, 9, e001749. [Google Scholar] [CrossRef]
- Buford, T.W.; Roberts, M.D.; Church, T.S. Toward exercise as personalized medicine. Sports Med. 2013, 43, 157–165. [Google Scholar] [CrossRef]
- Galiuto, L.; Fedele, E.; Vitale, E.; Lucini, D. Personalized exercise prescription for heart patients. Curr. Sports Med. Rep. 2019, 18, 380–381. [Google Scholar] [CrossRef]
- Szabo, A. ChatGPT a Breakthrough in Science and Education: Can it Fail a Test? Open Science Framework (OSF): Online, 2023. [Google Scholar] [CrossRef]
- Wang, M.; Wang, M.; Xu, X.; Yang, L.; Cai, D.; Yin, M. Unleashing ChatGPT’s power: A case study on optimizing information retrieval in flipped classrooms via prompt engineering. IEEE Trans. Learn. Technol. 2023, 17, 629–641. [Google Scholar] [CrossRef]
- Washif, J.; Pagaduan, J.; James, C.; Dergaa, I.; Beaven, C. Artificial intelligence in sport: Exploring the potential of using ChatGPT in resistance training prescription. Biol. Sport 2024, 41, 209–220. [Google Scholar] [CrossRef] [PubMed]
- Düking, P.; Sperlich, B.; Voigt, L.; Van Hooren, B.; Zanini, M.; Zinner, C. ChatGPT generated training plans for runners are not rated optimal by coaching experts, but increase in quality with additional input information. J. Sports Sci. Med. 2024, 23, 56. [Google Scholar] [CrossRef] [PubMed]
- Zaleski, A.L.; Berkowsky, R.; Craig, K.J.T.; Pescatello, L.S. Comprehensiveness, accuracy, and readability of exercise recommendations provided by an AI-based chatbot: Mixed methods study. JMIR Med. Educ. 2024, 10, e51308. [Google Scholar] [CrossRef] [PubMed]
- Akrimi, S.; Schwensfeier, L.; Düking, P.; Kreutz, T.; Brinkmann, C. ChatGPT-4o-Generated Exercise Plans for Patients with Type 2 Diabetes Mellitus—Assessment of Their Safety and Other Quality Criteria by Coaching Experts. Sports 2025, 13, 92. [Google Scholar] [CrossRef]
- Lai, X.; Chen, J.; Lai, Y.; Huang, S.; Cai, Y.; Sun, Z.; Wang, X.; Pan, K.; Gao, Q.; Huang, C. Using Large Language Models to Enhance Exercise Recommendations and Physical Activity in Clinical and Healthy Populations: Scoping Review. JMIR Med. Inform. 2025, 13, e59309. [Google Scholar] [CrossRef]
- Deligiannis, A.; Sotiriou, P.; Deligiannis, P.; Kouidi, E. The role of artificial intelligence in exercise-based cardiovascular health interventions: A scoping review. J. Funct. Morphol. Kinesiol. 2025, 10, 409. [Google Scholar] [CrossRef]
- Xu, Y.; Liu, Q.; Pang, J.; Zeng, C.; Ma, X.; Li, P.; Ma, L.; Huang, J.; Xie, H. Assessment of Personalized Exercise Prescriptions Issued by ChatGPT 4.0 and Intelligent Health Promotion Systems for Patients with Hypertension Comorbidities Based on the Transtheoretical Model: A Comparative Analysis. J. Multidiscip. Healthc. 2024, 17, 5063–5078. [Google Scholar] [CrossRef]
- Suraya Mohd Dan, A.; Linoby, A.; Shahlan Kasim, S.; Zaki, S.; Sazali, R.; Yusoff, Y.; Nasir, Z.; Haziq Abidin, A. Validation of a personalized AI prompt generator (NExGEN-ChatGPT) for obesity management using fuzzy Delphi method. Biol. Methods Protoc. 2025, 10, bpaf085. [Google Scholar] [CrossRef]
- Bricca, A.; Harris, L.K.; Jäger, M.; Smith, S.M.; Juhl, C.B.; Skou, S.T. Benefits and harms of exercise therapy in people with multimorbidity: A systematic review and meta-analysis of randomised controlled trials. Ageing Res. Rev. 2020, 63, 101166. [Google Scholar] [CrossRef]
- van der Leeden, M.; Stuiver, M.M.; Huijsmans, R.; Geleijn, E.; de Rooij, M.; Dekker, J. Structured clinical reasoning for exercise prescription in patients with comorbidity. Disabil. Rehabil. 2020, 42, 1474–1479. [Google Scholar] [CrossRef]
- Bickton, F.M.; Manifield, J.R.; Limbani, F.; Dixon, J.; Holland, A.E.; Taylor, R.S.; Calderwood, C.; Wittich, W.; Gregson, C.L.; Heine, M. Protocol for the development and validation of a Core Set for exercise-based rehabilitation of adults with multiple long-term conditions (multimorbidity) based on the World Health Organization’s International Classification of Functioning, Disability, and Health (ICF) framework. J. Multimorb. Comorbidity 2025, 15, 26335565251343923. [Google Scholar] [CrossRef]
- Saz-Lara, A.; Martínez Hortelano, J.A.; Medrano, M.; Luengo-González, R.; Miguel, M.G.; García-Sastre, M.; Recio-Rodriguez, J.I.; Lozano-Cuesta, D.; Cavero-Redondo, I. Exercise prescription for the prevention and treatment of chronic diseases in primary care: Protocol of the RedExAP study. PLoS ONE 2024, 19, e0302652. [Google Scholar] [CrossRef]
- Braun, V.; Clarke, V. Using thematic analysis in psychology. Qual. Res. Psychol. 2006, 3, 77–101. [Google Scholar] [CrossRef]
- Koo, T.K.; Li, M.Y. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J. Chiropr. Med. 2016, 15, 155–163. [Google Scholar] [CrossRef]
- Elstein, A.S.; Schwarz, A. Clinical problem solving and diagnostic decision making: Selective review of the cognitive literature. BMJ 2002, 324, 729–732. [Google Scholar] [CrossRef]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-thought prompting elicits reasoning in large language models. arXiv 2022, arXiv:2201.11903. [Google Scholar] [CrossRef]


| Clinical Cases | Sex | Age (Years) | Primary Condition(s) | Key Functional Limitation or Risk | Baseline Physical Activity Level | Primary Exercise Goal |
|---|---|---|---|---|---|---|
| Clinical Case 1 (Type 2 diabetes + obesity) | Male | 55 | Type 2 diabetes mellitus, obesity | Limited exercise experience, mild peripheral neuropathy | Low | Weight reduction and improved glycemic control |
| Clinical Case 2 (Knee osteoarthritis + fall risk) | Female | 70 | Knee osteoarthritis | Knee pain during walking, prior fall incident | Low | Pain reduction, maintenance of walking ability, and fall prevention |
| Clinical Case 3 (Post-colon cancer surgery recovery) | Male | 60 | Post-colon cancer surgery | Deconditioning, limited walking endurance, fatigue | Low | Physical recovery, fatigue reduction, and improvement of lifestyle habits |
| Clinical Case | Prompt Stage | Expert 1 Mean Score | Expert 2 Mean Score | Expert 3 Mean Score | Overall Mean Score |
|---|---|---|---|---|---|
| Type 2 diabetes + obesity | Stage 1 | 3.38 ± 0.49 | 3.30 ± 0.76 | 3.30 ± 0.81 | 3.33 ± 0.70 |
| Stage 2 | 3.82 ± 0.69 | 3.38 ± 0.73 | 3.68 ± 0.74 | 3.63 ± 0.74 | |
| Stage 3 | 3.78 ± 0.68 | 4.34 ± 0.52 | 3.62 ± 0.88 | 3.91 ± 0.77 | |
| Knee osteoarthritis + fall risk | Stage 1 | 3.56 ± 0.70 | 3.44± 0.70 | 3.84 ± 0.65 | 3.61 ± 0.70 |
| Stage 2 | 4.12 ± 0.63 | 3.54 ± 0.58 | 3.78 ± 0.71 | 3.81 ± 0.68 | |
| Stage 3 | 4.20 ± 0.40 | 2.96 ± 0.67 | 4.12 ± 0.82 | 3.76 ± 0.86 | |
| Post-colon cancer surgery recovery | Stage 1 | 3.60 ± 0.57 | 3.50 ± 0.51 | 3.46 ± 0.76 | 3.65 ± 0.69 |
| Stage 2 | 3.70 ± 0.58 | 3.40 ± 0.49 | 3.90 ± 0.81 | 3.77 ± 0.71 | |
| Stage 3 | 3.60 ± 0.64 | 2.86 ± 0.64 | 3.24 ± 0.74 | 3.37 ± 0.82 |
| Measure | ICC Model | ICC | 95% CI |
|---|---|---|---|
| Total score | ICC (2,3) | 0.139 | −0.350–0.482 |
| Expert | Number of Cases (N) | Number of Items | Cronbach’s α |
|---|---|---|---|
| Expert 1 | 45 | 10 | 0.923 |
| Expert 2 | 45 | 10 | 0.943 |
| Expert 3 | 45 | 10 | 0.923 |
| Item | Domain | ICC (2,3) | 95% CI |
|---|---|---|---|
| Safety | Safety | 0.201 | −0.142–0.485 |
| Guideline Alignment | Guideline | −0.358 | −1.230–0.209 |
| Feasibility | Feasibility | 0.020 | −0.501–0.401 |
| Personalization | Personalization | 0.015 | −0.479–0.389 |
| Specificity (FITT-VP) | Prescription | −0.432 | −1.237–0.136 |
| Consistency | Quality | −0.005 | −0.654–0.416 |
| Clarity | Quality | 0.384 | 0.015–0.635 |
| Completeness | Quality | 0.237 | −0.224–0.548 |
| Detail Reflection | Quality | 0.236 | −0.145–0.525 |
| Reproducibility | Quality | 0.152 | −0.311–0.485 |
| Item | Stage 1 (Minimal) | Stage 2 (Guideline-Based) | Stage 3 (Structured Schema) |
|---|---|---|---|
| Safety | 3.69 ± 0.76 | 4.07 ± 0.81 | 3.69 ± 0.85 |
| Guideline Alignment | 3.80 ± 0.50 | 4.16 ± 0.56 | 3.98 ± 0.69 |
| Feasibility | 3.71 ± 0.63 | 3.78 ± 0.67 | 3.60 ± 0.75 |
| Personalization | 3.38 ± 0.58 | 3.49 ± 0.76 | 3.47 ± 0.81 |
| Specificity (FITT-VP) | 3.38 ± 0.61 | 3.42 ± 0.62 | 3.56 ± 0.99 |
| Consistency | 3.49 ± 0.59 | 3.64 ± 0.53 | 3.64 ± 0.80 |
| Clarity | 3.51 ± 0.66 | 3.76 ± 0.61 | 3.78 ± 0.64 |
| Completeness | 3.58 ± 0.66 | 3.73 ± 0.65 | 3.93 ± 0.86 |
| Detail Reflection | 3.00 ± 0.83 | 3.42 ± 0.69 | 3.29 ± 0.89 |
| Reproducibility | 3.33 ± 0.67 | 3.56 ± 0.69 | 3.42 ± 0.89 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Choi, M.; Park, J.; Lee, M.; Beom, J.; Jung, S.Y.; Lee, K. AI-Generated Exercise Prescriptions for At-Risk Populations: Safety and Feasibility of a Large Language Model Assessed by Expert Evaluation. J. Clin. Med. 2026, 15, 2457. https://doi.org/10.3390/jcm15062457
Choi M, Park J, Lee M, Beom J, Jung SY, Lee K. AI-Generated Exercise Prescriptions for At-Risk Populations: Safety and Feasibility of a Large Language Model Assessed by Expert Evaluation. Journal of Clinical Medicine. 2026; 15(6):2457. https://doi.org/10.3390/jcm15062457
Chicago/Turabian StyleChoi, Minkyung, Jaeyong Park, Myeounggon Lee, Jaewon Beom, Se Young Jung, and Kihyuk Lee. 2026. "AI-Generated Exercise Prescriptions for At-Risk Populations: Safety and Feasibility of a Large Language Model Assessed by Expert Evaluation" Journal of Clinical Medicine 15, no. 6: 2457. https://doi.org/10.3390/jcm15062457
APA StyleChoi, M., Park, J., Lee, M., Beom, J., Jung, S. Y., & Lee, K. (2026). AI-Generated Exercise Prescriptions for At-Risk Populations: Safety and Feasibility of a Large Language Model Assessed by Expert Evaluation. Journal of Clinical Medicine, 15(6), 2457. https://doi.org/10.3390/jcm15062457

